Skip to main content

The Future Becomes Reality

Imagine being able to look at a car’s engine—almost as if you had X-ray vision—without lifting the hood. Or asking a handheld device how you can fix an oil leak even if you have no idea what you’re doing. Now extrapolate that to jet engines or oilrigs, the complexities of which can be mind-boggling to novice technicians or those tackling new problems. By using augmented reality (AR), however, those novices can quickly become experts.

“Say your dishwasher breaks down. You order a part but don’t know how to install it. So instead of hiring an expert to do it for you, how cool would it be if you had AR glasses that would guide you through the process?”

—Tal Drory, senior manager, Multimedia Analytics, IBM Research

At IBM Research in Haifa, Israel, Tal Drory, senior manager Multimedia Analytics, and Dr. Ethan Hadar, manager of Cognitive Vision and Augmented Reality, are helping businesses and consumers alike take advantage of AR to conduct complicated technical tasks without having applicable experience. This skill will help companies and consumers save money, create safer working environments and become more productive and efficient.

IBM Systems Magazine (ISM): I guess we should start with the obvious question: What is augmented reality?
Tal Drory (TD):
Essentially, augmented reality is a technology that allows us to augment the field of view of the person we’re assisting. For example, imagine you’re looking through a video camera of a mobile device at a retail shelf in the store. You see the reality in your field of view of the device, but that can be augmented with additional information, such as detailed product descriptions, promotions or even personalized information: You like this one; you’re allergic to that one; you bought this previously. Other ways to augment reality can involve playing audio or using speech to provide information, but the main mode that people usually mean when speaking about augmented reality is augmenting vision by overlaying that with specific information.
EH: I would add that augmented reality is a way to extend and expand people’s vision. It gives normal people with regular eyesight additional capabilities, sort of like Superman’s X-ray vision. For example, it can be used to see the internal elements within an engine by “looking” through the engine cover.

ISM: What’s the difference between augmented reality and virtual reality (VR)?
Virtual reality is where you don’t augment reality—you wear a VR headset and what you see through it is almost always not related to the reality that surrounds you. You’re not seeing anything beyond what’s projected to you in the VR headset. With AR, you see the reality with additional information overlaid on your field of view.
EH: Many people are mixing virtual reality with augmented reality. There is a borderline technology called “mixed reality” that we also deal with. The X-ray vision example is actually mixed reality; we’re adding virtual things on existing reality. Imagine you’re looking at your car and wondering what it might look like if you changed the color from white to red and changed the tires from this version to that version, and added a spoiler to the back of the car. You put on your AR glasses, run the software and see what it could look like. The car is real— you can actually touch it—but the wheels and the spoiler aren’t.
TD: If I had to draw the line, I’d say that mixed reality and AR are one side, which is broadly speaking AR, and then you have VR, which is completely virtual.

ISM: How can this type of technology benefit businesses?
Let’s focus mainly on the industrial setting: maintenance, repair, assembly lines. Technicians—we call them “cognitive technicians”—can use our AR-based solutions to be more productive, more efficient, work more safely, and to have the work guided. Imagine a novice technician is in the field and needs to change a piece of equipment while it’s raining. This technician doesn’t want to read or doesn’t have access to the user’s guide. They have smart glasses on them or a mobile device, and then they have an expert at the back end who, via augmented reality, can guide the technician in his or her work, step by step. There might even be a cognitive system in the back end, not a human expert. This is just one example where this technology can help technical workers become more efficient.
EH: Our research is conducted with actual customers in real industries with real problems. These may involve manufacturing, equipment assembly or maintenance and support. Industries are facing several challenges.

First, it’s very costly to maintain remote systems and support them. Our clients need to reduce costs, for example, by prolonging the sustainability of their existing infrastructure and reducing the cost of hardware.

Second, they have an aging but expert workforce that would like to retire. Unfortunately, the new people coming on board may never even see a problem that the expert already has seen. How can we maintain their knowledge? How can we plan for that? How do we keep the information and reduce the cost of labor, travel, time of operation and quality—and do it very quickly and very efficiently?

Think about an airplane that may have an issue parked at a gate. The penalty for not getting that plane airborne is huge for the manufacturer, the airline, the airport—everybody suffers. The mission is to very quickly fix a part that you have never seen before; and to do this efficiently, accurately and on time. Imagine an AR system that shows you the X-ray of the part and where it is in the cabin, for example, behind some panels. The system guides you on how to disassemble the elements around the part to replace it—all by talking to the system, getting the entire storyboard. Similarly, think of a railway and fixing issues inside the cabins, think of a server in a data center, think of any equipment you can imagine. There’s always a problem that needs to be solved, yet the expert is not there. Can it be solved by inexperienced technicians? Can it be solved quickly and accurately? That’s where we step in.

TD: You can even use this in your own home. Say your dishwasher breaks down. You order a part but don’t know how to install it. So instead of hiring an expert to do it for you, how cool would it be if you had AR glasses that would guide you through the process? This technology is applicable for technical operations and for consumers.

We have a very nice demo where you enter your car with a mobile device—an iPad, in this case—and scan the console. You can then see—augmented on your video screen—instructions about how to operate the radio, how to operate the air conditioning. Then you can interact with it by clicking on the instructions and asking questions. You no longer have to slog through your car owner’s manual. You can simply use this application on your mobile device to understand how to operate your car or even interact with it in the future.
EH: When you buy a new car today, you may not actually know where the window-washer reservoir is, where to add coolant, or how to top off your oil. I looked for my spare tire for 30 minutes—only to find out the manufacturer doesn’t include one.

ISM: What type of technology supports AR?
Beyond the computer vision technology that supports our core AR capabilities, natural language interaction also plays a big role in our solutions. Think of when you sit down and order a cup of coffee. Ten minutes later, you’ll ask the waiter for “another one, please.” The waiter understands that you mean another cup of coffee; there’s history and context. How can a machine understand that? How can a machine remember that when you’re saying “give me another one” that you’re referring to a cup of coffee? What about if you point at your empty cup without saying anything? There’s a lot of complex content that needs to be maintained.

To tackle this challenge, you need to talk to the machine. It’s imperative to have some voice-to-text understanding so a cognitive solution for natural-language processing can understand what you’re saying—even if you’re not using full sentences. This is basically what you do with the waiter at the coffee shop. By engaging in conversation, and having a history of this engagement, you have context, including visual, textual and gestural. All of those are part of our own contribution to Watson* services. We know how to recognize that this is the coffee, this is the cup, this is the coffee shop.
TD: Let’s go back to the dishwasher in your kitchen. Say it malfunctioned. You point to something in the dishwasher and ask, “What is this?” The system answers and then you ask, “How do I replace it?” The system understands that you’re referring to a part that you just asked about or pointed to. This is the context in which a jet-engine technician or an automotive technician is working, pointing to a part and asking how to replace it or order a new one.
EH: That’s the augmented reality aspect. At the back end, you need to have a cognitive system that can put it all together. It needs to understand your gestures, your text, your spoken language, what your intentions are and then verify that this is what you want. It looks at the results and uses deep learning to keep improving all the time. All of that is what we call cognitive-augmented reality.