In part one, we looked at the evolution of artificial intelligence from the Stone Age to the 20th century. In this installment, we’ll look at the rise of expert systems.
The End of the First AI Winter
The first AI Winter ended with the rise of expert systems. These systems are designed to emulate the decision-making process of human practitioners in a given field: medicine, the law, engineering and finance, among others. What happens is that an expert system builds a set of rules that are defined by human practitioners; these rules become a framework—more commonly, a “domain”—in which the computer can now be asked a question. It will then draw on the knowledge domain and give an answer that is, hopefully, apropos. The knowledge domains are kept purposefully small, in order to increase the likelihood of a correct answer. It seems a little like cheating, but at the time this was a necessary tactic to avoid two things:
1. The limits of the hardware at the time in terms of CPU, memory and storage capacities
2. A problem in the quest for Artificial General Intelligence (read that as “HAL9000”) called the “Common Sense Knowledge Problem” that remains unsolved to this day
Developers realized that if they adhered to the “smaller is better” rule and didn’t let their grand dreams of an artificial sentient overtake practicalities, they could produce nicely functional expert systems. A good example of this was the implementation by the Digital Equipment Corporation (“DEC” for short) of the XCON system. XCON stood for “eXpert CONfigurer” and aided in the ordering of DEC's VAX computer systems by automatically selecting the computer system components based on the customer's requirements. XCON proved so successful that many other companies wanted to get in on the action, and soon, billions of dollars were being spent to create more and diverse expert systems. Me? I didn’t get very far in my own expert system programming effort, but taking LISP for a test drive opened up an avenue of study that I’ve maintained to this day.
The 1980s: A Decade of Breakthroughs
Arguably, the most important development in AI to come out of the ‘80s—at least insofar as neural networks were concerned—was the development of the Backpropagation algorithm.
Backpropagation allows neural networks to learn complex patterns and relationships in data. Backpropagation works by adjusting the weights of the connections in the neural network, based on the difference of what that network thinks should be an answer to a question, versus what the actual answer is. For those of you that took some higher math, backpropagation uses the Chain Rule of Calculus to accomplish this task. It occurs in two phases: In phase one, which is called Forward Propagation, a question—or “input data”—is fed through the network, and the output is calculated. Phase two is Backpropagation, where the error between the predicted output and the actual output is propagated back through the network, and the weights of the connections are adjusted to reduce this error. Backpropagation is used in applications like computer vision, natural language processing and speech recognition. The math behind backpropagation is nearly incomprehensible, and even its finer concepts are very difficult to understand, but the way I always think about backpropagation is as a form of error checking in a neural network; a simplistic but effective way to remember what the algorithm does.
How many times during the day do you use ChatGPT, or some other large language model like Google’s Bard or Meta’s Llama 2? How about Google Search? A few, a dozen, a hundred? If I took a poll, I doubt there would be any of us that don’t use these tools on a daily basis. Or have you ever logged in to LinkedIn or some other social media platform and used the language translation feature to look at the profile information for a user from another country? Or even translated text in a word processor from one language to another? If so, you’re using an AI application called natural language processing (NLP). In the 1980s, NLP systems were developed, which involved teaching computers to understand and generate human language. If you used any of the above utilities, or have ever “conversed” with a chatbot or virtual assistant, you’re using NLP. In the field of AI, NLP is one of the most used, researched and developed-for applications; in fact, it’s probably the most used by us everyday folks.
The 1980s also saw significant advances in the field of robotics. The development of advanced sensors, actuators and control algorithms enabled robots to perform more complex tasks, and they were put to use in industrial settings for manufacturing and assembly, as well as in research and space exploration missions—think of Mars probes like Curiosity and Perseverance.
All in all, the ‘80s were a remarkable decade for advances in artificial intelligence. But then, history repeated itself: AI was over-promised and under-delivered. Dozens of companies hyped their “advances” in expert systems with showy conferences and glossy pamphlets. Unfortunately, when it came time to actually roll these systems into production, they mostly failed on nearly every count. Low-cost personal computers from companies like Apple and IBM overtook LISP machines in speed and power, and soon, there wasn’t much practical reason for anybody to buy the far more expensive and highly specialized systems. Expert systems began to show their brittleness, meaning they could make huge errors when they were fed unusual inputs, and they became just too expensive to maintain.
By the early 1990s, most commercial AI companies had gone bankrupt. The writing appeared to be on the wall, and the Second AI Winter began.
The 1990s: Machines Begin to Learn
Crazy computers and big-budget science fiction epics aside, the first real-world demonstration of artificial intelligence for the masses occurred in 1997. Most of us remember chess champion Gary Kasparov suffering defeat in a match against a computer that had undergone several name changes. Deep Blue was a milestone in AI implementation, and helped to end the Second AI Winter. An example of the Symbolic Reasoning tribe of AI (more on the five tribes of AI in future articles), Deep Blue used logical rules and symbols to represent knowledge and reasoning. A combination of algorithms and heuristics allowed Deep Blue to evaluate positions on the chess board and make moves. It queried a huge database of past moves and games to inform its decision making during play and it was also able to search many moves ahead. The ability to quickly formulate a series of future moves is crucial to expert play in chess; Deep Blue used a technique called “alpha-beta pruning” to reduce the number of moves it needed to evaluate. Alpha-beta pruning in turn uses the “MinMax algorithm” to recursively evaluate all possible moves in a game tree, assigning a score to each move based on the outcome of the game. This technique is now widely adopted and used in nearly all popular electronic versions of chess and Go. Deep Blue drove another stake in the ground: Although its scope was fairly narrow—limited to playing chess—it demonstrated the potential of rule-based and symbolic AI approaches for solving specific problems.
The 1990s also saw the development of an AI technique called “reinforcement learning” which is a type of machine learning that involves training agents to make decisions based on rewards or punishments. You don’t spank a computer when it’s bad or give it candy when it’s good in reinforcement learning; you send it a signal to indicate either a correct or wrong response. The computer then updates its decision-making policy based on this feedback, with the goal of maximizing the cumulative reward signals over the long-term. Today, reinforcement learning is used in robotics, game playing and what are called “recommender systems;” for the latter, think Amazon.
Support vector machines are used for classification and regression tasks. SVMs are based on the idea of finding the best linear boundary—or hyperplane—that separates two classes of data. Picture a piece of graph paper on which you’ve drawn a simple graph with a X and a Y axis. Populate your graph with a few hundred dots; these dots represent data points. Now, draw a line straight through the middle of your field of dots. The dots on one side of the line now belong to one data class, and the dots on the other side of the line form a second class. What you now have is a basic “if not X, then Y” situation, in which the SVM assigns a class and tells you which class your data belongs to, based on your question to categorize your data. Today, SVMs have been widely adopted in applications including computer vision, natural language processing and finance.
The Evolution of the “Bayesians” and the Beginnings of Machine Statistics
Thomas Bayes was an English statistician, philosopher and Presbyterian minister who lived in the 18th century. Bayes formulated a theorem that wasn’t widely accepted until after his death: the Bayes Theorem, which is used in probability theory and statistics to calculate the probability of an event based on prior knowledge or information. If you’ve ever talked to a statistician, you are talking to a person well-grounded in Bayesian mathematics. In AI, Bayesian networks are used in a variety of applications, including decision-making, natural language processing and bioinformatics. “Bayesian inference” is another tribe of AI, and we will cover it more fully in the future.
Andrey Markov was a Russian mathematician who lived in the 19th century. Among other things, he developed a way to use chains to model the occurrence of vowels and consonants in Russian literature. From this was developed the Markov model, which is a method to model randomly changing systems or the probabilities of transitioning between different states. In AI, we use hidden Markov models in pattern recognition and machine learning. HMMs—another AI breakthrough of the 1990s—are particularly useful for modeling sequences of observations that have a complex or variable structure, such as speech or text data; think medical records or legal documents. The “hidden” part refers to the modeled system having unobservable states. The success of this modeling depends upon also having a process which is influenced by the unobserved states; the goal is to learn what these hidden states are by observing the known process. Today, HMMs are used in speech recognition, natural language processing, robotics and time-series analysis.
Can Machines Reason Without General Intelligence?
When you go to your doctor, he performs an exam to determine what ails you; this usually includes a physical exam, lab work and—perhaps the most important—a history, where you describe your current issue in terms of your past experience—or the lack of it—with the same issue. Your doctor will then draw on his knowledge of similar cases involving other patients in both the literature and his own experience and apply that knowledge to make a diagnosis of your situation. One last advance in AI from the 1990s was to give systems the ability to draw similar conclusions from data—this is called case-based reasoning (CBR). In CBR, a problem is first identified and a set of similar cases is retrieved from a case library or knowledge base. The relevant features of the retrieved cases are then compared with the features of the current problem, and the most similar case is selected. The solution from the selected case is then adapted to the current problem to generate a solution. CBR is helpful in medical diagnosis, legal reasoning and engineering design. It is also used in recommender systems, where it is used to recommend products or services based on past user behavior. Look at CBR in this light: Your physician or attorney can’t possibly know and have assimilated all the medical or legal knowledge pertaining to any problem throughout history. But a computer with sufficient memory may be able to. In the future, look for specialized CBR AI systems to operate as electronic physician’s assistants or paralegals; devices capable of offloading a lot of the tedious research work that every doctor or lawyer has to slog through on a daily basis, and returning all of that time to patient care and legal counsel.
In part 3, we’ll close our history of AI with a look at some of the latest technologies. And starting soon, I’ll show you how to turn your laptop or desktop computer into an AI system of your own!