Automated Bayesian Data Fusion Analysis System Web viewVolume Two. Automated Bayesian Data Fusion...

Joseph George Caldwell, PhD; Topic No. AF161-045; Proposal No. F161-045-1533

Volume Two

Automated Bayesian Data Fusion Analysis System (ABDFAS)

Joseph George Caldwell, PhD1432 N Camino Mateo, Tucson, AZ 85745 USATel. (520)222-3446, e-mail [email protected]

(1) Identification and Significance of the Problem or Opportunity

This most recent SBIR solicitation underscores the need for improved methods of data fusion (sensor fusion, information fusion, multisensor fusion, multisource multisensor fusion). The following are selected solicitation titles on this subject:

1. AF161-045 Information Fusion to Enable Shared Perception between Humans and Machines2. AF161-056 Fusion of Multiple Motion Information Sources3. AF161-059 Event Recognition for Space Situational Awareness4. AF161-153 Fusion of Kinematic and Identification (ID) Information5. A16-043 Enterprise Enabled Intelligent Agents to Optimize Intelligence, Surveillance, and Reconnaissance (ISR) Collection6. A16-037 Predicting, Prognosticating, and Diagnosing via Heuristics and Learned Patterns7. N161-020 Human Computer Interfacing (HCI) for Autonomous Detect and Avoid (DAA) Systems on Unmanned Aircraft Systems (UAS)

This proposal describes an effort to develop a general-purpose computer-software program for constructing, analyzing, evaluating and optimizing data fusion systems. Specifically, the goal is to develop an automated tool that can construct Bayesian networks, analyze their performance, and optimize their performance. The system name is Automated Bayesian Data Fusion Analysis System (ABDFAS).

In this proposal, we use the term "data fusion" as a general term to refer to what is more specifically called sensor fusion, multisensor fusion, multisource multisensor fusion, and information fusion. A standard definition of data fusion is that provided by the Joint Directors of Laboratories (JDL) Subpanel on Data Fusion: "Data fusion is a process dealing with the association, correlation, and combination of data and information from single and multiple sources to achieve refined position and identity estimates, and complete and timely assessments of situations and threats, and their significance. The process is characterized by continuous refinements of its estimates and assessments, and the evaluation of the need for additional sources, or modification of the process itself, to achieve improved results."

The JDL identify five levels of data fusion: (0) Sub-object data assessment; (1) Object assessment; (2) Situation assessment; (3) Impact (threat) assessment; (4) Process refinement / resource management (adaptive data acquisition and processing). Levels 0 and 1 involve statistical analysis of object physical characteristics or dynamics (e.g., identification of a reentry vehicle, estimation of a ballistic trajectory), and effective techniques are available for working with such attributes (e.g., the Kalman-Bucy filter). Levels 2-4 involve estimation of relationships among battlefield entities and estimation of causal effects, and are substantially more difficult to address. This effort will focus on Levels 2-4, since those are the areas that are more difficult to address technically, and for which the need for and demand for improved methodology is greater.

There is a vast literature on the subject of data fusion. A quick search of the Internet identifies two sources that present an illustrative summary of the subject and a sample technique:

1. Castanedo, Federico, "A Review of Data Fusion Techniques," The Scientific World Journal, Vol. 2013 (2013), Article ID 704504, 19 pages, posted at http://www.hindawi.com/journals/tswj/2013/704504/ 2. Pan, Heping, Nickens Okello, Daniel McMichael and Mathew Roughan, "Fuzzy Causal Probabilistic Networks and Multisensor Data Fusion," Cooperative Research Centre for Sensor Signal and Information Processing, SPRI Building, Technology Park Adelaide, The Levels, SA 5095, Australia, invited paper for

1

http://www.hindawi.com/journals/tswj/2013/704504/

mailto:[email protected]


SPIE International Symposium on Multispectral Image Processing, October, 1998, Wuhan, China, SPIE Prodeedings, Vol. 3543, posted at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.8434&rep=rep1&type=pdf

The preceding articles describe many approaches to data fusion and illustrate a particular one, viz., fuzzy Bayesian networks.

The motivation for proposing to develop a tool for assisting data fusion is revealed in the paragraphs that follow, which discuss the challenges faced by present-day data fusion applications.

Challenges facing current data fusion applications

1. Data and information overload. A major problem facing current data fusion applications is the massive amount of data that is generated and requires processing and analysis. There are two aspects to this problem. First is simply the amount of raw data that must be handled. To be useful, these data must be stored, processed and analyzed. Prior to at least a minimal level of processing and analysis, it is often not known in advance which data are important. This fact requires that a large amount of data be stored, at least for a time. As the total amount of data increases, the likelihood that useful information will be undiscovered increases. Second, many statistical and optimization procedures suffer from the "curse of dimensionality," which refers to the fact that the number of computations required for analysis increases exponentially with the number of entities involved – objects, events, sources, sensors, variables, observations (in time or in space), missing values, factors, parameters and hypotheses involved. Adding more sources and sensors may theoretically improve accuracy and decision performance (reduce false negatives and false positives), but in practice it may not improve system performance because of an inability to process the additional data in a timely fashion.

2. Inability to convert information into intelligence (context-relevant meaning); difficulty of causal modeling and analysis. Sensor systems produce data, and in some instances they automatically process data to the point where they make a decision. In some applications, the data preprocessing is done based solely on observed associations in the data, without consideration of a causal model (that specifies which variables may affect other variables, and which variables are the source of a threat). By its nature, preprocessing reduces the amount of data that must be handled. If the preprocessing does not take into account a causal model that specifies which variables affect other variables, valuable data may be discarded in error, estimates may be seriously biased, and the power of statistical tests may be seriously degraded. In general, over all disciplines, statistical inference often fails to take causal relationships into account. The fundamental source of this problem is that almost all of modern statistical theory is concerned with associative analysis, not causal analysis. In almost no statistics texts is the word "causal" used. The reason for this is that most of statistical theory is concerned simply with estimating the strength of associations, rather than with estimating the magnitude of causal relationships. Furthermore, even in the realm of causal analysis, almost all of classical statistics is concerned with estimating the effects of causes, not in identifying the causes of effects. The detection or identification of threats, intents, situations and activities requires the use of causal models, not non-causal (associative-only) models.

Unfortunately, causality cannot be inferred from associative analysis alone (i.e., analysis of observational data without benefit of randomized forced interventions or a specified causal model; "correlation is not causation"). Some of the procedures and methods used in multisensor fusion at present are based on associative models, not on causal models. Association-only models may work well for data fusion Levels 0 and 1, but they are not appropriate for Levels 2-4, where causal relationships among variables are an essential aspect of the application. If causal relationships are important in an application but are not properly taken into account, the conceptual framework of such methods is fundamentally flawed, and it is unreasonable to expect high performance from systems based on such (associative) methods. In a competitive game, the probability of hidden or missing data will depend on key response variables, introducing biases into estimates based on most traditional (associative) statistical methods. In analysis of experimental data, such as in a laboratory experiment or in a clinical trial (randomized controlled trial), forced randomized interventions are used to unequivocally infer causal relationships. In analysis of sensor data, which are observational, not experimental, it is essential at higher levels of data fusion to specify a causal model (i.e., identify which variables affect other variables).

2

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.8434&rep=rep1&type=pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.8434&rep=rep1&type=pdf


In a military setting, the adversary is attempting to hide data from the observer. In this setting, "selection effects" are present, that introduce a stochastic relationship between the response variable (dependent variable, explained variable) and the missingness event. In technical terms, the nonresponse mechanism is not "ignorable" relative to inference methods based on the likelihood function (either classical methods or Bayesian methods). If the causal relationships are not correctly taken into account, and if the missing-data mechanism is not correctly taken into account, estimates will be biased and the error probabilities (significance level, power) of statistical tests will be incorrect.

3. Specialized nature of standard statistical and optimization methods and procedures; need for model validation. A very large proportion of the standard (basic, textbook) methods and procedures of modern statistics and optimization theory apply to very specialized situations, such as restriction of the probability distribution function describing a phenomenon to a small number of variables, or to a particular functional class (such as the class of stationary distributions, or exponential distributions, or linear models), or dependence on a small number of parameters (such as means, variances and covariances); or restriction of an optimization problem to the case of a single outcome (response) variable of interest; or restriction of a decision framework to a single player, two player, or zero-sum game representation. These special cases rarely represent real-world phenomena well. The assumptions that define these simple models (such as assumptions about the probability distribution of variables of interest) almost invariably do not apply to phenomena observed in the real world. If the assumptions are not correct, the statistical inferences (estimates, tests of hypotheses) will not be correct. To obtain good results from applying statistical methodology, the models must represent the salient features of real-world situations well. In real-world applications, valid models are typically far more complex than standard textbook models. To have confidence that the models on which decisions are based are valid representations of reality, the models must be subjected to validation testing. For simple models, system performance can often be measured by analytical means (formulas). For complex applications, system performance is assessed by means of simulation. At present, no comprehensive automated system exists, possessing all of the features intended for the ABDFAS, for validating general data fusion systems. The proposed system would provide a general-purpose simulation framework for assessing the performance of complex data fusion systems. Simulation can accomplish this since the system generates the ground truth (given user specifications), so that system performance can be compared to it.

4. Missing data. Many statistical procedures are designed for situations in which values are known for all variables, over a regular grid of space or time. When missing values occur in some variables, or entire observations are missing, the computational procedures required to produce correct estimates and tests become complicated and laborious (computationally intensive). What procedures are correct depends on the nature of the missingness, i.e., whether is it missing completely at random (MCAR); missing at random (MAR), i.e., dependent on other explanatory variables; or missing not-at-random (MNAR), i.e., dependent on the value of a response variable (in which case the missingness phenomenon is not "ignorable" for likelihood-function-based approaches such as maximum likelihood and Bayesian estimation). The occurrence of missing data causes the performance of a sensor system to degrade, both in terms of the quality of the processed data and in terms of the computational burden imposed to handle the missing data (e.g., a large increase in the number of candidate tracks in a correlation / tracking system). Missing data are a prominent feature of many sensor systems, and the algorithms used must be able to accommodate this feature in a correct fashion. Analysis of missing data is complex and difficult, and it is a fact that much statistical processing done in many systems does not handle missing data correctly (i.e., using likelihood-function-based methods). In many situations, the occurrence of missing data is well represented by means of a probability distribution for the missingness phenomenon. Examples include situations where an adversary attempts to conceal his activity, but is not completely successful in doing so, or when physical phenomena introduce noise into a sensor response.

5. Handling of sparse data. The preceding heading, "Missing data," refers to situations in which it is appropriate to represent the missing data mechanism by a random variable. In some applications, large data gaps are present, such as having no data at all from a particular source, or having no data for an extended period of time (such as caused by intermittent satellite coverage). In some applications, data become very sparse (either because of physical problems such as satellite paths or because an adversary is deliberately reducing observability). Conventional (likelihood-function based) methods of handling missing data make use of a probability distribution of missingness. Classical (frequentist) methods require substantial amounts of data in order to produce useful estimates and tests (in order to assure that laws of large numbers and the central limit theorem apply). These methods may degrade substantially or break down completely in sparse-data situations (sensor coverage gaps; adversary-caused missing data). Estimation

3


and testing must be able to respond to these situations. In such applications, a Bayesian approach may be appropriate. (If no data are available for an appreciable time, then estimates are derived from the last-updated posterior distribution, until new data arrive.)

6. Disparate data sources; noncommensurate data. Many statistical procedures are designed to accommodate a single phenomenon, e.g., to process radar returns or events of a certain type (such as a noise), and are not able to accommodate data from divers sources (e.g., COMINT, SIGINT, HUMINT, all source, open source, photographic), or to quickly update estimates as new data arrive. What is needed are decision systems that can accommodate all of the available data relating to a situation or event of interest, regardless of its nature, and update estimates quickly whenever observations arrive, no matter how simple or complex. It is very important when data are obtained from disparate sources that they be combined with existing data in a way that is consistent with the assumed causal model, and in a way that the statistical properties of the resultant estimates and hypothesis tests be known. Combining data from disparate sources presents difficulties for classical (frequentist) statistical inference methods, since the standard approach is to combine all of the data into a single likelihood function. The use of Bayesian networks overcomes this difficulty, since the Bayesian network representation allows naturally and easily for updating for any amount of data, ranging from a complete observation or complete sample on all important random variables to a single value of a single random variable (in every case, the posterior distribution is simply recalculated).

7. Difficulty in comparing alternative decision systems. Information collected from multiple sources can be processed in different ways and presented to human operators in different ways. There are often a number of different measures of effectiveness of a decision system, such as the probabilities of false positives and false negatives, and the cost associated with wrong decisions. There is a substantial body of knowledge relating to decision systems analysis, and it is not always employed in designing or selecting a preferred data fusion system.

8. Difficulty in defining an optimal "human / machine" interface. In most systems, data are preprocessed to a certain extent, and then presented in summary form to human decision makers. In order to achieve optimal performance, it is necessary to define the nature of this interface. If causal relationships are important to the application (e.g., at higher levels of data fusion), it is essential to recognize that the human being must specify the nature of the causal relationships among system variables. For this reason, it is necessary for the human being to remain a part of the loop that determines how the sensor system preprocesses data. The human / machine interface should be defined in a way that makes effective use of the advantages of human beings and of statistical methodology. The human / machine interface can be configured in many different ways, and the performance characteristics of these alternatives can vary substantially. Automated means are not generally available to assist this determination, to achieve optimal or near-optimal system performance.

9. Sensor allocation and tasking. In some systems there are opportunities for altering the data collection process, by varying the distribution of sensors over time and space, and varying the tasking of the sensors. It is desired to distribute and task sensors in a way that optimizes the value of the collected information, with respect to the quality of the decisions being made on the data. For many systems, there is quantitatively defined analytical link between the sensor distribution and decision quality, and the process of determining optimal allocations and tasking is done heuristically. In these applications, the allocation of limited resources may be done suboptimally, and it may not be known to what extent system performance could be improved by the use of mathematically rigorous optimizing procedures.

10. Lack of sufficiently fast algorithms. A statistical approach to estimation in complex models with substantial missing data is the Expectation-Maximization (EM) algorithm. As the amount of data involved in an analysis increases to very large amounts, as the complexity of the model increases, and as the amount of missing data increases, the amount of processing that is required to properly analyze the data increases dramatically. To be of practical value in demanding circumstances (where decisions are required quickly), it is necessary that decision systems be able to accomplish the needed processing within a reasonable amount of time (where "reasonable" depends on the application). Reliable decision systems should incorporate a capability to tailor the processing procedures to the workload burden and time constraints imposed by the situation. It is desired to assess the loss of performance by using "fast algorithms" or by discarding data to cope with data / information overload.

4


11. Inadequate probabilistic framework. In order to use statistical theory as a basis for estimation, prediction, hypothesis testing and control, it is necessary to maintain knowledge of the joint probability distribution of the variables of interest. To do this requires correct specification of the interrelationships among the important variables of a system. If the interrelationships are incorrectly specified, or if important variables are omitted from the model, or if important relationships are overlooked, the probability distribution of important response variables will be incorrect, and estimates and tests of hypothesis will be corrupted. In some complex data fusion systems, heuristic procedures are used in such a way that knowledge of these probability distributions becomes lost or corrupted. In some applications, the probability of missing data is not considered, or is specified incorrectly. In these cases, it is not possible to make sound statements about the precision of estimates, the power of statistical tests, or the magnitude of error probabilities (false positives, false negatives), or other system attributes, such as expected loss). For a data fusion system to be most useful, it is essential that it maintain knowledge of the joint probability distribution of key variables of interest, so that sound probability statements can be made about the system performance (accuracy of estimates and power of tests).

The preceding paragraphs describe some of the challenges facing data fusion applications. Current systems address these problems in different ways, and to differing extents. There exists a large variety of data fusion applications, and a wide variety of approaches have been used to address them. While much statistical methodology is available to assist the development and operation of data fusion systems, at present the methodology is somewhat chaotically distributed among several fields of knowledge (statistical inference, data processing, causal modeling and analysis, decision science, psychometrics, optimization), and no integrated general-purpose analytical tool is available to assist the construction, configuration, evaluation, optimization and operation of data fusion systems. It is the goal of this effort to develop a comprehensive automated computer software system that will accomplish all of these functions in an integrated, effective and efficient manner. The Phase I effort will develop a basic prototype that will address many of the features found in real-world data fusion applications. If feasibility is established, the Phase II effort will develop a commercially useful general-purpose tool. Phase III would consist of using this tool to assist the development, configuration, operation, optimization and evaluation of planned or existing data fusion systems, in both military and commercial applications.

Examples of military data fusion applications include those of the seven SBIR topics listed above (configuration of the human / machine interface; fusion of multiple motion information sources; event recognition for space situational awareness; optimization of intelligence, surveillance and reconnaissance (ISR) collection; predicting, prognosticating and diagnosing via heuristics and learned patterns; and human computer interfacing for autonomous detect and avoid systems on unmanned aircraft systems), as well as the following:

Detection, location, tracking and identification of military entitiesAllocation and tasking of sensors (radar, synthetic aperture radar (SAR), sonar, infrared, electro-optical imaging)Situation and threat assessment for airborne early warning and control (AEW&C)

Examples of non-military applications include:

Air traffic controlLaw enforcementHomeland securityMedical diagnosisVideo surveillanceFeature extraction, classificationRobotics (manufacturing, hazardous applications)Remote sensing (crops, weather, environment, hazardous waste, geological resources)

(2) Phase I Technical Objectives

It is proposed to develop an Automated Bayesian Data Fusion Analysis System (ABDFAS) that incorporates all of the following methodologies: causal modeling (specifically, directed acyclic graphs (Bayesian networks)); Bayesian statistical inference; proper (likelihood-function-based) treatment of missing values; state-of-the-art estimation procedures (e.g., the Expectation-Maximization algorithm); simulation of ground truth; simulation to estimate

5


variances ("bootstrapping"); Bayesian treatment of large data gaps and non-commensurate data; fast algorithms; statistical decision theory; psychophysics (Receiver Operating Characteristic (ROC) analysis); and Lagrangian optimization (Generalized Lagrange Multipliers). The system will be an automated computer software system (Microsoft Windows operating system), that is easy to use. It will be an integrated system that will include all aspects of the analysis (sensor system specification, data generation (simulation), system reconfiguration, sensor management and optimization, decision system analysis (ROC analysis), and system performance assessment) as modules. It will be a mathematically rigorous system, based on sound principles of causal modeling and analysis, treatment of missing and sparse data, and optimization.

The Phase I work will explain the proposed concept in detail and illustrate its application in a number of different application examples (representative of the preceding list of titles). The examples will be constructed using computer simulation. That is, a situation will be specified, in terms of an environment and the data sources and sensors monitoring the environment, and a data set will be generated corresponding to this environment and source / sensor configuration. The data will then be analyzed using the proposed methodology (Causal Bayesian Analysis: causal modeling using Bayesian networks and causal analysis using Bayesian estimation), and the results of the analysis described in terms of a variety of performance measures (e.g., precision, bias, decision error types, computing workload, computing time, analysis demands). It is not planned to construct a parallel analysis using classical (non-Bayesian, non-causal (associative)) methodology, but the new methodology will be compared to the classical approach in qualitative, subjective terms.

The goal of Phase I is to demonstrate the feasibility of the approach through concrete examples, to characterize the performance of the approach, and to compare and contrast the approach to existing approaches. There is little doubt that the Phase I effort will be successful in establishing feasibility, since the goal is to develop a system that utilizes known technologies. Over the past few decades, much progress has been made in developing practical implementation of rigorous methods for conducting causal analysis and handling missing values, including such things as the Expectation-Maximization algorithm, the Neyman-Rubin causal model, the Heckman model, and the use of probabilistic networks to represent causal systems. Other technologies, such as statistical decision theory, Bayesian inference, the Receiver Operating Characteristic graph and the Generalized Lagrange Multiplier method are older and well established, and are very appropriate for this application. What is novel about the proposed effort is the integration of a wide variety of relevant technologies, the automation, and the comprehensiveness of the system capabilities.

Developmental computer processing will be done on a late-model commercial microcomputer (e.g., HP or Dell using a Windows 8.1 (or later version) touch-screen operating system), using Microsoft development and application software (Visual Basic, Access, Excel, PowerPoint, Word). Statistical analysis will be done in Phase I using the Stata statistical software system (but it is intended that in Phase II, all statistical analysis functions will be self-contained, with no use of external statistical software packages).

(3) Phase I Statement of Work

The Phase I effort will accomplish the following tasks.

Task 1. Prepare technical report describing the Automated Bayesian Data Fusion Analysis methodology and a prototypical Automated Bayesian Data Fusion Analysis System in detail.

Task 2. Identify and describe several sample applications to illustrate the Automated Bayesian Data Fusion Analysis methodology. These examples will include, taken together, the following features:

1. An application that must operate in near-real time.2. An application that need not operate in near-real time.3. An application with multiple sensors of a single type (source).4. An application with multiple sources.5. Parametric and nonparametric (and perhaps semiparametric) applications.6. An application with a small amount of missing data.7. An application with large data gaps.8. An application involving optimal deployment and employment of sensors.9. An application that involves multiple time series.

6


10. An application in which missing data are not at random (i.e., the probability of missing is dependent on the value of the dependent variable).

For each sample application, generate sample data that can be used to demonstrate Causal Bayesian Analysis, and perform the demonstration (for each sample application), i.e., conduct a Causal Bayesian Analysis of the generated data. Evaluate the system performance by comparing the estimates and decisions to the model used to generate the data (i.e., to "ground truth"). Each sample application will constitute a very simplified illustration of a Automated Bayesian Data Fusion Analysis System.

Task 3. For each sample application, compare the performance of alternative decision systems and/or decision criteria using ROC graphs.

Task 4. Assuming that the Phase I effort confirms the feasibility and desirability of developing Automated Bayesian Data Fusion Analysis Systems based on Causal Bayesian Analysis, develop a plan for Phase II development. Phase II will consist of developing a prototype ABDFAS for one or more sample applications embodying the features listed above.

(a) Description of Approach

Development of the computer software system to implement the preceding capability will be done in compliance with the modern systems and software engineering discipline, as described in references such as Structured Analysis and System Specification by Tom DeMarco (Yourdon Press / Prentice Hall, 1978/79) and Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design by Edward Yourdon and Larry L. Constantine (Yourdon Press, 1975, 1978). A "rapid prototyping" approach will be used. The development will use a microcomputer with Microsoft Windows 8.1 or later operating system, and use the Microsoft software development environment (Visual Studio, Visual Basic, Access, Excel). (The Windows 8.1 operating system allows for touch-screen capability, which enables easy migration of the developed system to a computer tablet.)

The approach to developing the ABDFAS will be described by presenting summary descriptions of its key technologies. These include Bayesian inference, causal modeling and analysis (the representation of causal models by Bayesian networks (directed acyclic graphs), estimation of causal effects), procedures for handling missing data, Receiver Operating Characteristic (ROC) analysis, and Lagrangian optimization using the method of Everett's Generalized Lagrange Multipliers (GLM). These technologies will now be described.

Bayesian Statistical Inference

There are two basic approaches to statistical inference, referred to as the "classical" or "frequentist" approach and the "Bayesian" approach. Under the classical approach, all of the data on which the inference is based is contained in the sample. With this approach, a number of assumptions may be made about the probability distribution from which the sample is assumed to be selected, such as the assumption of a normal distribution, or a constant variance. Whatever unknown parameters are involved in specifying the probability distribution are assumed to be fixed constants, not random variables. In this framework, there is a single probability distribution involved – the distribution defined by specific values of the distribution parameters. This distribution that produces observations is called the sampling distribution. A more descriptive term for the frequentist approach might be the "sample-data-only" approach.

Under the Bayesian approach, it is allowed that unknown parameters defining the probability distribution may be random variables. This framework allows for the incorporation of prior information about the parameters into the statistical inference process. The prior information, or "belief," about likely values of the unknown parameters is represented in a probability distribution, called the parameter distribution. As in classical statistical inference, it is assumed that the data sample is selected from a probability distribution having particular fixed values for the unknown parameters – those parameter values being a particular realization of (sample from) the parameter distribution. As with the classical approach, this distribution, which generates the data sample, is called the sampling distribution. In the Bayesian approach, however, there are two probability distributions involved – the distribution of the parameter(s) and the (sampling) distribution that generates the sample observations (data) for a particular value of the parameter(s). In this framework, the data sample contains information not only about the sampling distribution, but also about the parameter distribution. In the frequentist approach, the sample data are

7


used to make inferences about the sampling distribution. In the Bayesian approach, the sample data are used to make inferences about the parameter distribution. Before the sample is selected, the parameter distribution is called the prior distribution. After the sample is selected, the parameter distribution conditional on the observed sample is called the posterior distribution. The formula for determining the posterior distribution from the prior distribution and the sample is as follows:

f (θ|x )=f ( x|θ ) f (θ )

f (x )=

f ( x|θ ) f (θ)

∫ f ( x|θ ) f (θ ) d θwhere

x=( x1 , …, xn )=the sample( thedata) θ=(θ1 ,…,θk )=the parameters f (θ )=the prior distributionof θ f ( x|θ )=the sampling distributionof x ,givenθ f (θ|x )=the posterior distribution of θ , conditional on x (i . e . , given the sampledata )

The formula presented above is called Bayes' Formula or Bayes' Rule. At times there has been some controversy about the use of Bayes' Rule in this context. This controversy notwithstanding, Bayes' Rule is correct. It has sometimes been misapplied, as when the integral in the denominator was not taken over the full distribution of θ. One of the objections voiced is that the prior distribution may be subjective, or that its functional form may be assumed with little justification (e.g,, in the use of conjugate prior distributions, so that the prior and posterior distributions are from the same family of distributions). Similar charges may be levied against the frequentist approach, for example, when assuming a normal distribution, or homoskedasticity, or zero correlation among the explanatory variables and the residuals (errors) of a model, or that missing values occur at random. (In many cases important assumptions can be tested. When this is not possible, sensitivity analysis may be applied to assess the effect of incorrect assumptions.)

In the Bayesian approach, estimates and statistical tests are based on the posterior distribution (which combines information from the parameter (prior) distribution and the sampling distribution). For example, if a parameter θ is the mean of a distribution, the mean of the posterior distribution may be taken as an estimate of θ.

The use of a Bayesian approach introduces an extra level of complexity in the statistical inference process, viz., the need to specify a prior distribution for the parameters, and to take this prior distribution into account in the inference process. For large sample sizes, there is little advantage to be gained from using the Bayesian approach over the (simpler) frequentist approach. The reason for this is that as the sample size increases, the relative proportion of information (about the parameters) contained in the sample becomes large relative to the amount of information contained in the prior distribution. The Bayesian approach offers substantial advantages in situations in which the sample data are sparse. It is a natural approach to use when additional data are obtained (e.g., from an unexpected source) and it is desired to update an estimate. It works well for combining information from disparate sources, since it is often appropriate to update the current parameter distribution (and obtain the new posterior distribution) with the "new" information without the need to reconstruct the entire likelihood function of the new and old data combined. It also lends itself quite well to work with causal models, as will now be discussed.

Causal Modeling and Analysis

Causal Relationships; Probabilistic Theory of Causality

The vast majority of modern statistical analysis is associative. Joint probability distributions do not distinguish between which random variables are or may be causes of other random variables – all of the random variables in the joint distribution are on an equal footing. Causal inference is best done by introducing forced randomized changes in certain random variables, and observing the change in the association between those random variables (the explanatory random variables, independent variables) and other random variables (the explained random variables, dependent variables, response variables). Causality is not inferred from the associations in observational data. It is specified external to the data, either by assumption or by a randomized experiment.

8


Very few statistics texts use the terms "causal" or "causal relationship" (a possible exception is books on experimental design or clinical trials (randomized controlled trials). The reason for this is that statistical inference is designed primarily to assess the effects of causes, not to identify the causes of effects. The term that is used (rather than "cause") is "effect." An analysis of variance estimates the "effect" of a drug treatment on mortality. A regression model estimates the "effect" of a training program on wages. When it is not clear what the cause of an observed effect is, it is said that the effects are confounded, not that the causes are confounded. The intervention is specified, and the goal of the statistical analysis is to estimate the magnitude of the effects associated with those interventions (causes). It is not the case that a data set is presented to an analyst and the analyst is asked to determine the cause of certain relationships. The causal relationships are specified, and the analyst is asked to estimate the magnitude of the effects.

A causal relationship is specified in a model description, or it is defined in terms of probabilistic relationships. There are many definitions of the term "causal relationship." A standard one is Suppes' probabilistic definition (or "theory"): C causes E in Ki if and only if P(E|C & Ki) > P(E|Cc & Ki), where C, E and Ki are events (in a probability space), Cc denotes the complement of C, and Ki is a state description over a complete set of confounding factors. C causes E if C and E are probabilistically dependent once we have stratified over a complete set of confounding factors, i.e., taken "everything else" into account. A causal model defines a joint distribution over a set of counterfactual statements (i.e., conditioning on C and Cc).

The standard approach to causal modeling and analysis is through the use of directed acyclic graphs (DAGs), also called Bayesian networks. A comprehensive description of the modern theory of causality is presented in Causality: Models, Reasoning, and Inference, 2nd edition by Judea Pearl (Cambridge University Press, 2009, 1st edition 2000).

In low-level data fusion applications (Levels 0 and 1), the causal relationships are usually obvious. For example, in tracking a ballistic missile, a Kalman filter is specified in terms of a "plant" model that specified vehicle dynamics and a "measurement" model that specifies observation noise. The causal relationships are specified in the plant model. In this situation, the causal relationships that generate the data are known. What is more significant from the viewpoint of analysis is that the sample data can usually be considered as a single multivariate measurement on an object at each point in time, with associations but no causal relationships among the variables. For this application, the data (radar returns) are generally very regular and missing values not a serious problem (the plant model "takes over" quite well for a ballistic trajectory, if some observations are missing).

In correlation and tracking of ships, the situation is rather different. In this application, large data gaps may occur (e.g., as a satellite moves out of range), and the system dynamics are unpredictable (a ship may change course). Also, data may be from different sources (e.g., satellite, sonar). Nevertheless, specification of causal relationships among model variables is not the essence of the problem.

In dealing with observation of human activity, at Levels 2-4 of the data fusion paradigm, the situation is quite different. Here, the activity of threat agents may be very similar to the activity of non-threatening actors, and the problem is one of inferring intent, or threat – that is, the cause; or of inferring the effects of a cause in the presence of confounding variables. The threat agent may attempt to hide the observations, so that the response probability is dependent on the very variable of interest. These conditions make it difficult to detect and identify the threat. In order to make valid estimates, and control the error probabilities (false positives, false negatives) it is important to understand the causal relationships among the observed random variables, and take them into account in the modeling and analysis. In a frequentist approach, this is done by including the response probability in the likelihood function. The likelihood function for a sample unit is the product of the probability of a particular response given that the unit is observed times the probability that the unit is observed (i.e., not missing). If the probability of missing is random, no difficulty is introduced into the analysis. If the probability of missing depends on the response, serious problems arise. If this feature is not properly taken into account, the estimates will be biased and the test will be corrupted (error probabilities wrong). Similar problems arise when latent (hidden) variables are an important part of the model specification.

In some fields, much progress has been made in the development and use of causal models. In statistics, Donald Rubin promoted use of what is now known as the Neyman-Rubin Causal Model. In economics, James Heckman promoted use of latent-variable models (the Heckman Model) to make causal inferences. These developments

9


occurred in the 1970s and 1980s, and are now much-used in analysis of observational socio-economic data (as in economic program impact evaluation). These models are also used in medical studies, although in that field reliance for causal inference rests heavily on use of experimental designs (randomized controlled trials). In other fields, it appears that little progress has been made in using causal modeling and analysis in statistical inference (as evidenced by the dearth of attention to causal modeling in most statistics texts). (There are few exceptions to this. One is the text, All of Statistics: A Concise Course in Statistical Inference by Larry Wasserman (Springer, 2004).)

In higher-level data fusion applications, the goal is to infer intent (causality), or obtain unbiased estimates of causal effects. One way of addressing this problem is to consider alternative hypotheses, corresponding to different categories or levels of threat, and specify alternative likelihood functions (for the response variable) corresponding to each of them. The issue of assessing causality is then reduced to the problem of standard tests of hypotheses (i.e., rejecting or accepting each hypothesis, based on the sample data). For this approach to work well, it is essential that the likelihood function be correct (for each hypothesis). This includes, critically, that the probability model for missingness be correctly specified and included in the likelihood function of the sample data. To do this, it is essential that the causal model be specified correctly. Representation of Causal Relationships by Directed Acyclic Graphs (Bayesian Networks)

It was mentioned earlier that a standard method of representing causal relationships among the random variables of a system is by means of Bayesian networks, or directed acyclic graphs (DAGs). An example of a DAG is presented in Figure 1 (below).

A causal model is a mathematical model that describes causal relationships among variables. There are a number of causal theories and definitions of causal relationships. It appears impossible to construct a universally acceptable definition of what a “cause” is, and as a result there are a number of theories of causation.

Causal modeling falls mainly into two categories – deterministic or “logical” causal models, and probability-based models, which involve consideration of stochastic outcomes and the use of statistical analysis to estimate the magnitude of causal effects. This discussion is concerned with the latter category of causal modeling.

A standard approach is to define causes and causation in terms of probability distributions. These theories are known under the rubric of “probabilistic causation.” (See Hunting Causes and Using Them: Approaches in Philosophy and Economics by Nancy Cartwright (Cambridge University Press, 2007).) A simple definition of causation is that an event C is a cause of an event E if and only if P(E|C)>P(E|~C), where the tilde (~) represents complementation (negation, “not”). (That is, the occurrence/presence of one event “raises the probability” of another.) This simple definition encounters difficulties. First, it is symmetric. That is, if P(E|C)>P(E|~C) then P(C|E)>P(C|~E), i.e., if C is a cause of E then E is a cause of C. Another problem is that of “spurious causes”: if C and E are both caused by a third factor, A, then it is possible that P(E|C)>P(E|~C) even though C does not cause E. The variable A, which affects both the cause C and the result E is called a confounding factor, or “confounder.”

Another problem with the preceding definition is “Simpson’s paradox”: for any data set, it is possible to reverse any conditional probabilistic relationship between two variables with the addition of another variable to the data set. For example, the expected grades for males at a school may be higher than those for females overall, but if conditioned on course subject, the expected grades for females may be higher in every course subject than for males (e.g., if males tend to take easier courses). (in symbols, P(E|C)<P(E|~C), but P(E|C&B)>P(E|~C&B) and P(E|C&~B)>P(E|~C&~B).) Additional reversals may be caused by additional variables. While Simpson’s paradox may not occur very often, it is not at all surprising. It illustrates the fact that it is not possible to infer causal relationships from analysis of data alone. Once a causal model is specified, Simpson’s so-called “paradox” is resolved.

It may be thought that the preceding issue of the symmetry of a probabilistic relationship can be resolved by placing an external constraint (outside of the probability relationship) that one variable may be a cause of another variable only if it is temporally precedent. Even temporal precedence is not sufficient to establish a causal relationship. For example, a barometer reading falls prior to a storm, but the barometer reading does not cause the storm.

10


It may be said that a causal relationship exists between two variables if they are probabilistically dependent when randomization-based realizations (forced changes, selections) are made in one of them. This definition works after a fashion, but it has drawbacks: it involves an experiment, and it does not unequivocally establish causality (since an hypothesized causal relationship may be established only as the result of a statistical decision). Some would require that forced changes (controlled manipulations), not just random selections, be made to establish a causal relationship. Some object to this definition since it conflates the concept of causality with a means for measuring it. Unfortunately, no better definition of causality is available.

In causal analysis, the difficulty in defining a (real) cause and a (real) causal relationship are resolved – or sidestepped – by defining these terms in the context of a model. The situation is similar to the difficulty of defining a probability in terms of the result of a series of experiments, vs. defining it in terms of set and measure theory. In this discussion, a causal relationship will be considered to be a “primitive,” i.e., it is not defined in terms of more elementary concepts.

A causal relationship is always relative to all other variables that may affect the causal variable or the effect variable, i.e.. the variables that define the “setting” or “environment” in which the cause-effect relationship is being studied. The “causal effect” of one variable on another may be viewed either in the context of holding all other variables constant (a conditional causal effect), or averaging over them in some population (an average causal effect).

Since the 1970s, much work has been done in the field of causal modeling and analysis. In this discussion, we are concerned only with theories of probabilistic causation that lend themselves to the use of statistical methods to assess the magnitudes of causal effects. The most significant early work was that of Rosenbaum and Rubin (1983), who extended Neyman’s theory of potential outcomes (“counterfactuals”) from the realm of designed experiments to the analysis of observational data (passively observed data).

There are two significant problems associated with the Rosenbaum-Rubin approach. First, it is founded on assumptions about potential outcomes; some people have difficulty in accepting this framework, and prefer a theory that is expressed in terms of actual observations (and does not involve explicit consideration of hypothetical counterfactual observations). The second problem is that the theory does not lend itself to detailed specification and analysis of causal relationships, i.e., of incorporating detailed (variable-to-variable) prior knowledge or beliefs about the nature of the causal relationships in the system under study. With the Rosenbaum-Rubin theory, it is necessary to make an assumption about conditional independence of potential outcomes and treatment given covariates, but the theory does not provide a means for investigation of conditions under which the conditional independence may hold. Many researchers would prefer to make a detailed specification of causal relationships (such as a path diagram, or a set of structural equations, or a DAG), and have the theory provide a means for assessing whether conditional independence holds, given their specification.

If one variable is causally related to another, we use the expression that it “affects” (or “is affected by”) the other variable. If there is an apparent relationship between the two variables (e.g., they are correlated), but it is not clear that it is a causal relationship, we use the expression that one variable “is related to” or “is associated with” or “is correlated with” the other.

Much of statistics is descriptive or associative inference, where it is of interest to estimate the strength of associative relationships. In many technical discussions, reference is made to “effects,” without mentioning whether they are associative or causal. Often, the term “treatment” and “cause” are used interchangeably. Paul Holland asserts that attributes may not be causes if they cannot be manipulated. (Holland, Paul W., “Statistics and Causal Inference,” Journal of the American Statistical Association, Dec. 1986, vol. 81, no. 396, pp. 945 – 960. This article is limited to consideration of experimental data, not observational data.) In this discussion, we focus on estimation of the effects of causes that are identified in a causal model, not on the problem of deciding whether a variable is a cause of a specified effect (e.g., whether smoking is a cause of lung cancer).

The terms “causal modeling” and “causal analysis” are often used interchangeably. “Causal modeling” tends to refer to the activity of specification and description of causal models (e.g., by means of directed graphs and descriptions of conditional probability distributions (such as model equations)) and to determination of which causal effects are estimable, whereas “causal analysis” tends to refer to the use of statistical methodology (design and

11

Figure 1. Example of a “generic” causal diagram (directed acyclic graph (DAG), Bayesian network)

X1

X2 X3

X4

X5 X6


analysis) to estimate the strength of estimable causal relationships (e.g., the magnitude of an effect in a response variable caused by (associated with, following in time) a specified change in an explanatory variable).

In 2000, Judea Pearl published a book (Causality: Models, Reasoning, and Inference, 2nd edition, Cambridge University Press, 2009 (1st ed. 2000)) in which he presented a comprehensive description of a probabilistic causality theory that is of broad applicability and relatively easy to apply. The theory allows the analyst to specify causal knowledge and beliefs in detail using a family of models based on directed acyclic graphs (DAGs). Pearl identifies two criteria that can be used to quickly determine, from the causal model’s dag, whether a specified causal effect is estimable.

There are two main aspects to causal modeling and analysis. The first aspect, or “qualitative aspect,” is concerned with the description of causal models and the determination of whether consistent estimates of causal effects can be derived from data, given a causal model. The second aspect, or “quantitative aspect,” is concerned with statistical procedures for estimating the magnitude of causal effects. Together, these two aspects are referred to as “causal modeling and analysis.” Pearl’s theory and book addresses the first aspect of causal modeling analysis (the qualitative aspect: description of causal models; assessment of estimability).

Pearl’s theory of causality will now be summarized very briefly. This summary draws from Pearl’s book, Causality (op. cit.). The definitions presented below are almost verbatim from Pearl’s book.

A causal structure (or causal diagram) is a set of variables V is a directed acyclic graph (DAG) in which each node corresponds to a distinct element of V, and each link represents a direct functional relationship among the corresponding variables.

A causal model (Pearl, op. cit., p. 44) is a pair M = (D, ΘD) consisting of a causal structure D and a set of parameters ΘD compatible with D. The parameters ΘD assign a function xi = f(pai, ui), i = 1,…,n, to each Xi in V and a probability measure P(ui) to each ui, where PAi are the parents of Xi in D and where each Ui is a random disturbance distributed according to P(ui) independent of all other u. (Variable names are denoted in upper case, and specific values of variables in lower case.)

Figure 1 presents an example of a causal diagram. (We use the terms “causal diagram” and “causal model” somewhat interchangeably. Technically, the two terms are not equivalent: the causal model is a broader concept that includes the causal diagram.)

Each variable or set of variables is represented by a capital letter. If a variable (or set of variables) exerts a causal influence on another variable (or set of variables), then a solid directed line (line with an arrowhead on one end and not the other) is drawn from the former to the latter (with the arrowhead pointing toward the variable that is acted on). A variable that is causing an effect on another is called a causal variable, explanatory variable, independent variable or input variable. A variable that is influenced by a causal variable is called an effect variable, affected variable, response variable, outcome variable, dependent variable or output variable. The key point to this representation of causality is that the direction of the causal relationship is specified in the network (by the

12


directionality of the arrows). This resolves the problem with defining a causal relation simply by a single conditional probability statement.

In this discussion, we will consider only causal models that may be represented by directed acyclic graphs (DAGs), in which there are no mutual (simultaneous) causal relationships (double-headed arrows between two variables) or cycles. A dashed line indicates that a number of unobserved variables may have an influence on the variables at the arrow ends of the line.

As mentioned, DAGs are also referred to as Bayesian networks, or as causal probabilistic networks. The use of the adjective "Bayesian" in "Bayesian networks" refers to the use of conditional probabilities to represent causal relationships. In that context, it refers to model specification, not to the statistical inference procedures used to construct estimates and make tests of hypothesis. (The term "Bayesian network" as an alternative to DAG was coined by Pearl (in 1985) to emphasize three aspects of DAGs for causal modeling: (1) their subjective nature (prior probabilities are often referred to as subjective probabilities); (2) the use of Bayes' formula for updating information; and (3) the distinction between associative (evidential) reasoning and causal reasoning.) To avoid confusion, we shall generally use the more descriptive term "directed acyclic graph" (or DAG) to refer to a causal model, rather than the term "Bayesian network" (but we use both terms). In the title Automated Bayesian Data Fusion Analysis System, the perspective is that Bayesian networks will be used to represent causal relationships, and Bayesian statistical inference will be used to construct estimates of causal effects and make statistical tests of hypothesis.

In Figure 1, the model variables are X1,…,X6. X1 represents unobserved variables affecting X2 and X3. X6 might be an outcome variable of interest, and X5 might be a treatment variable, and X2, X3 and X4 observed covariates.

Bayesian networks provide an exceptional framework for fusing observations from multiple sensors and sources. The model guarantees the consistent integration of new information, which can be input to any node of the model and propagated correctly throughout the rest of the model. The nodes of the model may represent arbitrary random variables, allowing for the integration of disparate and non-commensurate types of data. The data may be discrete or continuous.

Estimation of Causal Effects

While Pearl's book tells much about the probabilistic theory of causality and the use of directed acyclic graphs to describe causal relationships and assess estimability of causal effects, it tells little about procedures for estimating the magnitudes of causal relationships (i.e., of causal effects). Pearl's main focus is on specifying criteria for determining whether a causal effect is estimable, given a directed acyclic graph (using criteria such as the "Back Door Criterion" and the "Front Door Criterion"). It is not concerned with numerical procedures for analysis (estimation, testing of hypotheses). Procedures for constructing estimates of causal effects from data (once estimability (identifiability) is established) are described in a number of articles and texts, such as the following:

1. Wooldridge, Jeffrey M., Econometric Analysis of Cross Section and Panel Data, 2nd ed. (The MIT Press, 2010 (1st ed. 2002)).2. Heckman, James J. And Edward J. Vytlacil, “Econometric evaluation of social programs, Part I: Causal models, structural models and econometric policy evaluation,” Handbook of Econometrics, Vol. 6b, Chapter 70, pp. 4779 – 4874, (see also Part II (Chapter 71, pp. 4875 – 5143) and Part III, pp. 5145 – 5303), Elsevier, 2007. An extract of Part I is Econometric Causality by James J. Heckman, National Bureau of Economic Research Working Paper 13934, April 2008, posted at Internet website http://www.nber.org/papers/w13934 .3. Morgan, Stephen L. and Christopher Winship, Counterfactuals and Causal Inference: Methods and Principles for Social Research (Cambridge University Press, 2007)4. Angrist, Joshua D. and Jörn-Steffen Pischke, Mostly Harmless Econometrics: An Empiricist’s Companion (Princeton University Press, 2009)5. Lee, Myoung-Jai, Micro-Economics for Policy, Program and Treatment Effects (Oxford University Press, 2005)6. Greene, William H., Econometric Analysis, 7th edition (Prentice Hall, 2012)

13

http://www.nber.org/papers/w13934


There are two basic approaches to causal analysis, which are generally referred to as the "statistical" approach and the "econometric" or "regression" approach (or as "conditioning to balance" vs. "conditioning to adjust"). The preceding references describe mainly the econometric approach (although the Wooldridge book gives good coverage of both approaches). The statistical approach is described in "The central role of the propensity score in observational studies for causal effects," by Paul R. Rosenbaum and Donald B. Rubin, Biometrika (1983), vol. 70, no. 1, pp. 41-55. The econometric approach consists of specifying a detailed causal model (such as a logistic or probit latent-variable model), and estimating causal effects based on that model. The statistical approach consists in stratifying the data so that the nonresponse probability is the same for all units within the same stratum (e.g., propensity-score methods, inverse-probability weighting).

The preceding texts illustrate the estimation of causal effects in the social and economic fields. The methodology applies to the field of sensor fusion (simply change the terminology, such as by replacing "treatment" by "enemy action" or "threat," and "estimation of a treatment effect" by "estimation of the effect of an enemy action or threat").

The following estimation techniques will be used in the ABDFAS:

Method of momentsMaximum likelihood (ML) (analytical for simple models, Newton-Raphson for complex ones)Bayesian estimationExpectation Maximization (EM) algorithm (variance estimation will be done using "bootstrapping")(Optional: Markov Chain Monte Carlo (MCMC) algorithm; Metropolis-Hastings algorithm; Gibbs sampler)Generalized Estimating Equations (for exponential family of distributions)

References on these estimation procedures, and on Bayesian inference, include the following:

1. Hilbe, Joseph M., and Andrew P. Robinson, Methods of Statistical Model Estimation, CRC Press / Chapman and Hall, 20132. Hardin, James W., and Joseph M. Hilbe, Generalized Estimating Equations, 2nd ed., CRC Press / Chapman and Hall, 20133. Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari and Donald B. Rubin, Bayesian Data Analysis 3rd ed., CRC Press / Chapman and Hall, 20144. Carlin, Bradley P., and Thomas A. Louis, Bayes and Empirical Bayes Methods for Data Analysis 2nd ed., Chapman & Hall / CRC, 1996

Procedures for Handling Missing Data

A salient feature of data fusion systems is the occurrence of missing data. In order to accomplish correct statistical inference, it is necessary to include the probability distribution for the response (non-missingness event) in the likelihood function. For many purposes, if the response probability does not depend on the outcome variable of interest, and if the parameters ψ relating to the missingness distribution are distinct from the parameters θ relating to the outcome variable distribution, then, using either the maximum-likelihood approach or Bayesian approach to estimation, the response mechanism (and its probability distribution) may be ignored (the parameters θ and ψ are vectors). The reason for this is that the posterior distribution may be represented as fθψ(yobs, response | x) = fθ(yobs | x) fψ(response | x), and (since the second factor does not depend on θ), the maximizing value of θ for this full likelihood function is the same as for the likelihood function, fθ(yobs | x) ignoring the missing value mechanism. This situation is referred to as "missing at random," or MAR. It is discussed at length in Statistical Analysis with Missing Data 2nd ed. by Roderick J. A. Little and Donald B. Rubin (Wiley, 2002) (see, in particular, pp. 117-124).

Roderick and Little discuss many heuristic methods that are used to handle missing data, including least-squares (Yates' method), analysis of covariance (Bartlett's method), Buck's method, complete-case analysis, available-case analysis, weighting, mean imputation, regression imputation, and others. They then discuss likelihood-based approaches, including maximum-likelihood and Bayesian methods, and multiple imputation, and they discuss practical algorithms for performing the estimation (EM algorithm, Gibbs' sampling, EM applied to Kalman filtering, and others).

14

Figure 2. The Stimulus-Response Matrix for a Two-Alternative ("yes/no") Decision Problem

Response alternative (decision)

Stimulus alternative (state of the world, ground truth)

S = "yes," a signal N = "no," no signal

s, a signal

n, no signal

Correct decision, hit, probability P(S|s) = sensitivityIncorrect decision, miss, false negative, Type 2 error, probability P(N|s)

Incorrect decision, false positive, false alarm, Type 1 error, probability P(S|n)Correct decision, correct rejection, probability P(N|n) = specificity

P(S|s) + P(N|s) = 1, P(S|n) + P(N|n) = 1


If the preceding (ignorability) conditions are not satisfied, then the problem of dealing with missing values becomes substantially more difficult. It is necessary to model the missing-data mechanism, and to take it into account in the estimation process. Procedures for doing this are addressed in the Roderick / Little book, and in the following:

1. Kim, Jae Kwang, and Jun Shao, Statistical Methods for Handling Incomplete Data, CRC Press / Chapman & Hall, 20142. Molenberghs, Geert, Garrett Fitzmaurice, Michael G. Kenward, Anastasios Tsiatis, and Geert Verbeke, Handbook of Missing Data Methodology, Chapman & Hall / CRC, 20153. Enders, Craig, Applied Missing Data Analysis, The Guilford Press, 20104. Graham, John W., Missing Data: Analysis and Design, Springer, 20125. Allison, Paul D., Missing Data, Sage Publications, 2002

The ABDFAS will incorporate Bayesian methodology for handling missing values, both for the ignorable (MAR) and nonignorable cases. The nonignorable cases require specification of the missing-data mechanism and inclusion of its probability distribution in the model. This requires special techniques, such as those developed by Rubin and Heckman for causal modeling (latent variable methods, propensity-score based methods; references listed in the section above, "Estimation of Causal Effects").

The missing data analysis procedures describe in this section refer to situations where it is appropriate to specify a probability distribution for missingness. For massive data gaps, as mentioned earlier, we will simply use Bayesian estimation (i.e., the latest update of the posterior distribution).

Receiver Operating Characteristic (ROC) Analysis

The Automated Bayesian Data Fusion Analysis System will incorporate Receiver Operating Characteristic analysis as the primary methodology for comparing alternative decision systems. The following paragraphs summarize this methodology. Standard references in this methodology are Green, David M. and John A. Swets, Signal Detection Theory and Psychophysics (Wiley, 1966) and Swets, John A., and Ronald M. Pickett, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory (Academic Press, 1982).

For discussion purposes, we consider the problem of assessing the accuracy for a two-alternative decision problem, e.g., deciding whether a particular threat is present or not present – a "yes/no" decision problem. This decision must be made from a series of tests (corresponding to multiple sources). The essential characteristics of a two-alternative decision problem are embodied in a stimulus-response matrix, as shown below:

15

Figure 3. The Receiver Operating Characteristic (ROC) Curve

P(S|s) = 1 – P(Type 2 error) = sensitivity = proportion of hits

P(S|n) = P(Type 1 error) = 1 – P(N|n) = 1 – specificity = proportion of false positives


A false positive (or false alarm) is the decision that the threat is present when it is not; in decision theory this is called a Type 1 error. A false negative (or miss) is the decision that the threat is not present, when it is; in decision theory this is called a Type 2 error.

By varying the decision criteria, the probabilities of the two types of error may be adjusted (traded off). By changing the diagnostic procedures (e.g., adding sources or sensors, improving sensors), the probabilities of both types of error may be reduced. A problem that arises is that there are often a large number of stimulus-response matrices (one for each decision criterion), so that this way of describing system performance becomes cumbersome. The challenge is how to summarize the performance of a decision system succinctly. A solution to this problem is found in the Receiving Operating Characteristic (ROC) graph, which displays the probability of a correct decision ("hit") versus the probability of a false positive. An example of a ROC curve is shown in Figure 3 (below).In comparing two decision systems for the same value of P(S|n), the system having the higher value of P(S|s) is selected. In comparing decision systems over a range of values of P(S|n), a more general decision criterion is used, such as selecting the system for which the Bayes risk (expected loss) is less.

A considerable number of data fusion problems may be formulated as two-alternative decision problems, i.e., deciding whether a threat is present or not. A standard methodology for estimating the probability that a threat is present is a logistic regression model:

Logistic Regression Model: λ i=logit ( p i)=log ( pi/ (1−pi))=α +β ' x=α+∑

jβ j x ij

Estimate: pest=1/ (1+exp(−α est−βest x ))

Decision criterion: decide “yes” if pest > c and “no” if pest <= c

Vary c and calculate the proportion of hits and false positives, and plot on a ROC graph (each value of c yields a different point on the curve).

Examples of ROC curves are presented in Logistic Regression Examples Using the SAS System (SAS Institute, 1995). An automated capability for producing ROC curves will be integrated into the ABDFAS (in Phase II), without the need for accessing statistical software packages (such as SAS or Stata).

16


Generalized Lagrange Multipliers

The ABDFAS will include an automated capability for sensor management (allocation, tasking). This capability will use the Generalized Lagrange Multiplier method for solving constrained resource-allocation problems. The approach will consist in determining an allocation of sensors to locations (or taskings to sensors) in such a way as to minimize the variance of estimates of interest. This approach will be demonstrated both in simple models in which a closed-form formula exists for the variance, and in general, when this is not the case.

The problem is to determine x* such that H(x*) = maxx H(x), where H(x) = ∑i=1

t

H i(x i), subject to the constraints

∑i=1

t

x i=X and xi > 0. The function H(x) is additively separable over the sensor locations, and so the Everett

Generalized Lagrange Multiplier (GLM) technique can be used to find a solution (see Hugh Everett, III, "Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources," Operations Research, 11:399-411, 1963)). The problem becomes one of finding xi corresponding to

max x i(H ¿¿ i ( x i )− λ x i)¿where λ (a Lagrange multiplier) is adjusted so that the constraint ∑

i=1

t

x i=X is satisfied.

The GLM constrained optimization methodology has been widely used in defense applications. It is both very general and very fast. It was used, for example, for the QUICK General-Purpose War-Game Simulator and for solving many ballistic-warfare problems. The GLM method is very "robust," and can be used to solve constrained optimization problems in the case of nonlinear, discontinous, nonconvex objective functions. In the present application, the method can be used as long as the sensor locations are not so close that their responses become correlated. This can be assured by requiring that all sensors be located at least a certain distance apart.

(b) Human/Animal Subjects and/or Recombinant DNA. None.

(4) Related Work

The proposed automated system makes use of known technologies, including Bayesian statistical inference, causal modeling and analysis, handling of missing data and sparse data, Receiver Operating Characteristic analysis, and Lagrangian optimization. The proposed Principal Investigator (Joseph Caldwell) has substantial knowledge of and accomplishments in these techniques. For US Army Communications-Electronics Command he developed the Scenarist Automated Scenario Generation System (SCENARIST); for the US Air Force Systems Command he conducted research in automated scenario generation for tactical theater air warfare applications; for the US Naval Space Systems Activity he developed correlation / tracking methodology for satellite ocean surveillance; he has developed and presented technical seminars on causal modeling and handling of missing and sparse data (small-area statistics), and developed causal models in evaluation projects; he designed a ROC subsystem for a medical diagnosis project; he has applied the method of Generalized Lagrange Multipliers to solve resource-constrained games in defense applications. He has developed and marketed statistical analysis software (e.g., the first commercially available general-purpose Box-Jenkins time series analysis package, TIMES). He holds a PhD degree in mathematical statistics. Much of his education and experience relate directly to this application. Examples of projects that he has directed in topics related to the proposed effort are cited in the section describing key personnel.

(5) Relationship with Future Research or Research and Development

(a) Anticipated Results of the Proposed Approach if the Project Is Successful

If the project is successful and a working ABDFAS is developed, it will be of substantial value in designing, configuring, evaluating, optimizing and comparing data fusion systems. It will accomplish these functions efficiently and effectively, using mathematically rigorous methods. There are numerous data fusion applications that could benefit from such as system, including not only systems to be developed, but existing systems as well.

17


These applications are of "high value," and occur in both defense and non-defense areas; a substantial market exists for the proposed tool.

(b) Significance of the Phase I Effort in Providing a Foundation for Phase II Research or Research and Development Effort

Phase I will assess the feasibility of the proposed approach. Phase II will develop a prototype system that incorporates all important features of the system. This prototype will be a commercially useful system that will embody all essential features in a single (modular) computer program that is easy to use and runs fast.

(c) Clearances, Certifications and Approvals Required to Conduct Phase II Testing

The proposed Principal Investigator (Joseph Caldwell) has worked for many years in defense applications, and has possessed high-level clearances in the past (TS, SCI/SI/TK, Q, SBI). These clearances were issued through approved facilities, including Vista Research Corporation, a secure facility (TS) owned and managed by Dr. Caldwell. The work proposed here will be conducted in a sole proprietorship, which does not have a facility clearance. If the Phase I effort establishes feasibility and it appears that a Phase II contract will be awarded, steps will be taken to set up an appropriate secure facility for the Phase II work.

(6) Commercialization Strategy

Examples of data fusion applications in military and non-military applications were listed at the beginning of the proposal. Upon completion of the Phase II effort, a working Automated Bayesian Data Fusion Analysis System will be available. Descriptive material (hardcopy brochures and electronic files) will be developed to describe the system and illustrate a sample application. A website will be set up that presents this promotional information. Defense agencies and commercial firms that are involved in applications that require data fusion will be contacted (by mail and e-mail), and referred to the website. The product will be advertised in relevant technical and subject-matter publications. Presentations will be made to appropriate conferences relating to data fusion. Procurement resources such as Commerce Business Daily will be reviewed for business opportunities.

The business model to be used to commercialize the product in Phase III will be determined in Phase II. The proposed system is based on causal modeling and analysis, a field that is not widely understood or applied. It is possible that proper use of the ABDFAS may be challenging. Two approaches to the Phase III commercialization will be considered in Phase II. The first is to develop the ABDFAS and license or sell it as a software package. The second is to contract to apply it, or provide training in its use.

(7) Key Personnel

Dr. Joseph Caldwell is proposed as Principal Investigator of the project. He is Principal of the proposing firm (Joseph George Caldwell, PhD). During the course of the project, more than one-half of his time will be spent in the employ of this firm. A summary of his background and experience follow.

Education... Ph.D., Statistics, University of North Carolina at Chapel Hill, 1966 B.S., Mathematics, Carnegie-Mellon University, 1962

Consultant... to government agencies, international agencies and corporations

Director/Supervisor of major projects in... o strategy and tactics (national security, ballistic missile defense, theater-level operations; game theory; statistical

decision theory; optimal allocation of resources; constrained optimization; nonzero sum games; resource-constrained games; asymmetric warfare (terrorism / counterterrorism; guerrilla warfare; nuclear attack by rogue nation); conflict and negotiation; Bayesian statistics; forecasting and control)

o artificial intelligence / expert systems (automated scenario generation for positioning of military units in intelligence / electronic warfare applications)

18


o multisensor fusion; situation assessment; estimation, prediction, and control; correlation/tracking; satellite surveillance systems (all-source information system for tracking / correlation of ocean vessels)

o simulation and modeling (ocean surveillance, ballistic missile defense, communications-electronics; combat models (general-purpose forces, strategic and tactical))

o systems and software engineering (structured analysis / design; object-oriented design) o system development (requirements specification / analysis, design, implementation and test) o test and evaluation (communications-electronics, intelligence / electronic warfare – C4IEW) o statistical applications (test design, experimental design, fast algorithms, data analysis, data mining, statistical

methodology, sample survey design and analysis) o scientific programming (statistics, optimization, graphics; expert systems, spatial analysis) o operations research and statistics o geographic information systems, (US Army Corps of Engineers GRASS system; ESRI ArcView) o game theory (zero-sum and nonzero-sum (Nash bargaining / equilibrium solution), constrained games, ill-

conditioned problems; computer solutions of complex games) o programming languages / development environments / tools / mathematical software packages: C, FORTRAN,

Visual Basic, MS-DOS/Windows, UNIX, SAS, SPSS, Statistica, dBASE/FoxPro/Access, SQL, ArcView GIS, MATHCAD, Numerical Recipes, many others

o standards: ISO 9000 Quality Management; ISO 12207 Information Technology; DOD-STD-2167A, MIL-STD-498 Software Development; Carnegie Mellon University Software Engineering Institute Capability Maturity Model (SEI CMM)

Manager of contract research / system development firm (seven years); successful bidder on numerous technical contracts, including four Small Business Innovation Research (SBIR) contracts. Director of more than twenty projects.

Manager of Research and Development and Principal Scientist of the US Army Electronic Proving Ground's (EPG's) Electromagnetic Environmental Test Facility (EMETF).

Adjunct Professor of Statistics at the University of Arizona, Tucson, Arizona

Developer of technical seminars and computer program packages in defense applications, sample survey, forecasting, and geographic information systems Selected publications:

1. Caldwell, J. G., T. S. Schreiber, and S.S. Dick, Some Problems in Ballistic Missile Defense Involving Radar Attacks and Imperfect Interceptors, ACDA/ST-145 SR-4, Special Report No. 4, Lambda Corporation / US Arms Control and Disarmament Agency, 1969. Unclassified summary (Optimal Attack and Defense for a Number of Targets in the Case of Imperfect Interceptors, 31 July 2001) of mathematics posted at Internet website http://www.foundationwebsite.org/OptStratTerminalDefense.htm.

2. Caldwell, J. G., Subtractive Overlapping Island Defense with Imperfect Interceptors, ACDA/ST-166, Lambda Corporation / US Arms Control and Disarmament Agency, 1969 (Secret). Unclassified summary (27 August 2001) of mathematics posted at Internet website http://www.foundationwebsite.org/SubtractiveOverlappingIslandDefense.htm.

3. Caldwell, J. G., Documentation for the time series analysis program: TIMES, Lambda Corporation, 1970. Extract TIMES Box-Jenkins Forecasting System, Reference Manual, Volume I, Technical Background, (revised March 1971,,reformatted September 2006), posted at http://www.foundationwebsite.org/TIMESVol1TechnicalBackground.pdf ).

4. Caldwell, J. G., Conflict, Negotiation, and General - Sum Game Theory , Lambda Paper 45, Lambda Corporation, 1970. Reprint posted at Internet website http://www.foundationwebsite.org/Conflict.htm.

5. Caldwell, J. G., HARDPOINT: A Time-Sharing Model for Analysis of Minuteman Defense, Paper prepared for presentation at the Strategic Nuclear Force Exchange Modeling Symposium, National Military Command System Support Center, Washington, DC, March 23, 1971.

6. Caldwell, J. G., HARDSITE Defense Model, Volume 1 (Mathematical Description) and Volume 2 (Program Description and User's Manual), Lambda Corporation / Office of Deputy Assistant Secretary of Defense (Systems Analysis), 1971.

19

http://www.foundationwebsite.org/Conflict.htm

http://www.foundationwebsite.org/TIMESVol1TechnicalBackground.pdf

http://www.foundationwebsite.org/SubtractiveOverlappingIslandDefense.htm

http://www.foundationwebsite.org/OptStratTerminalDefense.htm


7. Caldwell, J. G. and J. P. Mayberry, Naval Combat Damage Model, Lambda Corporation / Office of Naval Research, 1972.

8. Caldwell, J. G. and G. E. Pugh, Multiple Resource - Constrained Game Solution , Lambda Corporation, 1972. Also: Caldwell, J. G., Multiple Resource-Constrained Game Solution: Computer Program Description and User's Manual, Lambda Corporation, 1972.

9. Caldwell, J. G. et al., Correlation/Tracking Performance Study - - DCP Input (U) , Vols I and II, Report R-1650, Planning Research Corporation / Navy Space Systems Activity (NAVELEX), 1973 (Secret).

10. Caldwell, J. G. et al., Improvements to the Systems Simulation Program, Vols. I and II, Report R-1801, Planning Research Corporation / Navy Space Systems Activity (NAVELEX), 1974 (Secret).

11. Caldwell, J. G. et al., Dynamic Electromagnetic Combat Effectiveness Model (DESCEM): Measures of Message Delay, US Army Electronic Proving Ground, Electromagnetic Environmental Test Facility, 1984.

12. Caldwell, J. G. et al., Realistic Electromagnetic Environment for Stress Load Testing, US Army Electronic Proving Ground, Electromagnetic Environmental Test Facility, 1984.

13. Caldwell, J. G. et al., Simulation Model Architecture and Intelligence / Electronic Warfare Model Extension, US Army Electronic Proving Ground, Electromagnetic Environmental Test Facility, 1985.

14. "Modeling and Simulation Architecture for Dynamic Electronic System Testing at the US Army Electronic Proving Ground's Electromagnetic Environmental Test Facility," paper presented at the "Government/Industry -- Partners in Testing" US Army Test and Evaluation Symposium, US Army Test and Evaluation Command / American Defense Preparedness Association, Aberdeen Proving Ground, March 19-21, 1985.

15. Caldwell, J. G., Improved Algorithms for Estimation, Prediction and Control, Vista Research Corporation / Office of Naval Research, 1986.

16. Caldwell, J. G., Theater Tactical Air Warfare Methodologies: Automated Scenario Generation, Final Report produced under contract to US Air Force Systems Command, Vista Research Corporation, Sierra Vista, Arizona, 1989.

17. Caldwell, J. George, William N. Goodhue, Sharon K. Hoting, William O. Rasmussen, Fletcher A. K. Aleong, Eric Weiss, Marty Diamond and Christopher S. Caldwell, Scenarist Automated Scenario Generation System, Final Report for the Project, Research in Artificial Intelligence for Noncommunications Electronic Warfare Systems, produced under contract to the US Army Communications-Electronics Command, Vista Research Corporation, Sierra Vista, Arizona, 1991.

18. Caldwell, J. G., Description of the Statistical Subsystem of the Automated Receiver Operating Characteristic System, Western Research Company, Inc., Tucson, Arizona, 1995.

Positions:

2005-present Statistical Consultant, Spartanburg, South Carolina, and Tucson, Arizona, USA. Consultant on research design for impact evaluation of overseas development projects (research design in support of causal analysis).

2001-2005 Management Consultant / System Developer, Clearwater, Florida. System development work in Zambia (funded by US Agency for International Development). (Development of national Education Management System (EMIS) for the Government of Zambia.)

1999-2001 Director of Management Systems, Bank of Botswana (Botswana’s central bank).1991-1998 Management Consultant / Statistician / System Developer, Clearwater, Florida. (E.g., Lagrangian

optimization for Canada Trust (Toronto Dominion Bank); development of the national civil service Personnel Management Information System (PMIS) for the Government of Malawi.)

1989-1991 President, Vista Research Corporation, Tucson, Arizona. Research in artificial intelligence for noncommunications electronic warfare systems (work on automated scenario generation, funded by US Army Communications-Electronic Command).

1982-1991 Director of Research and Development and Principal Scientist, US Army Electronic Proving Ground’s Electromagnetic Environmental Test Facility / Bell Technical Operations and Combustion Engineering; Adjunct Professor of Statistics, University of Arizona; Principal Engineer, Singer Systems and Software Engineering; Tucson and Sierra Vista, Arizona.

1964-1982 Consultant or employee to several contract research firms, including Research Triangle Institute, Lambda Corporation, and Planning Research Corporation. (Operations research, statistics and information technology.)

20


(8) Foreign Citizens. None.

(9) Facilities / Equipment. All work will be performed at 1432 N Camino Mateo, Tucson, Arizona, on a Hewlett Packard Envy x360 microcomputer using the Microsoft Windows 8.1 operating system. Development work will be done using Microsoft development software (Visual Basic, Access, Excel) and presentation software (PowerPoint, Word). Statistical analysis (simulation, statistical inference (estimation, hypothesis testing), ROC analysis) will be done (in Phase I) using the Stata statistical analysis software system.

(10) Subcontractors / Consultants. None.

(11) Prior, Current, or Pending Support of Similar Proposals or Awards. No prior, current, or pending support for proposed work.

21

Date post:	30-Jan-2018
Category:	Documents
Upload:	dinhtu
View:	219 times
Download:	0 times

Automated Bayesian Data Fusion Analysis System Web viewVolume Two. Automated Bayesian Data Fusion...

Documents