fault isolation


fault isolation

Determining the cause of a problem. Also known as "fault diagnosis," the term may refer to hardware or software, but always deals with methods that can isolate the component, device or software module causing the error. Fault isolation may be part of hardware design at the circuit level all the way up to the complete system. It is accomplished by building in test circuits and/or by dividing operations into multiple regions or components that can be monitored separately. After fault isolation is accomplished, parts can be replaced manually or automatically (see fault tolerant).

Fault Isolation vs. Fault Detection
Although the terms "fault isolation" and "fault detection" are sometimes used synonymously, fault detection means determining that a problem has occurred, whereas fault isolation pinpoints the exact cause and location.

Built Into Normal Operation
Software can also be created and run with fault isolation in mind. Many techniques can be used. For example, program modules can be run in different address spaces to achieve separation. In addition, generating intermediate output that can be examined as well as recording operational steps in a log are ways to assist the troubleshooter to manually determine which routine caused the application to stop working or stop working properly.

In a network, intelligent agents can be placed in various nodes that continuously collect traffic statistics that are analyzed in real time to detect and pinpoint the fault. See fault detection.