The Temporal Logic of Causal Structures

The paper

In this work we introduced a new method, using temporal logic, for reasoning about and inferring causal relationships. We use philosophical definitions of and show that these may be easily translated into the framework of temporal logic. Then, relationships satisfied by the system are found using model checking and their significance is verified using techniques for false discovery control.
[pdf] [bibtex]

Quick links

Comparison with other methods and full results

Due to space limitations, we presented the results for only one synthetic MEA data set in the paper. Here we discuss results from our method on all data sets, and compare these with the results achieved using other methods.

Data:

The synthetic MEA data was generated to simulate the firings of a set of neurons for which the connectivity (which neurons cause others to fire) is known. While it was originally created as a data mining challenge for a KDD workshop on temporal data mining, it is a prime candidate for causal inference. It gives a case where many of the common assumptions hold: there are no causes of a neuron's firing other than the firing of other neurons, and all neurons have been measured; there is no possibility of Simpson's paradox; and the structure from which the data is generated is Markovian.

Data was generated for five different embedded causal structures, such that there were four data sets for each structure: two generated with a low noise level and two with a higher noise level. Each data set contains 100,000 firings of 26 neurons, each represented by a letter of the English alphabet. For more information on the data and challenge, see: [link].

Method:

We compared our approach to two others:

  • The PC algorithm of Spirtes, Glymour and Scheines, as implemented in Tetrad IV, using the default significance test (Chi square) and parameters (alpha=0.05, depth=-1).
  • The Granger test for causality, as implemented in the MSBVAR R package, using a lag of 20 time units. This method returns both p-values and f-statistics for each relationship. We used the f-statistics, as these were better able to discriminate amongst the hypotheses. Finally, we chose the threshold for when a null hypotheses would be rejected (i.e. when the relationship would be called "causal"), using the same fdr method as for our temporal logic based approach.

Note that all results are shown for 18 data sets. For pattern 1, there are only one low and one high noise data set containing as many firings as the others. The other two sets generated for pattern 1 contain only 10,000 firings and were omitted from testing with all methods in order to fairly compare the results.

With our method, the hypotheses tested were of the form: one neuron causes another to fire within 20-40 units of time after the causal neuron fires. With all methods we tested all pairs of neurons.

Summary of results:

We have three sets of statistics computed for each method. First, we took the false discovery rate (FDR), which is the proportion of false discoveries (false positives) as a proportion of all discoveries. In the table below, we see that our method had a very low false discovery rate, while the rate for Granger causality was nearly 50%. Note that we used the same FDR procedure for determining which hypotheses would be deemed significant for the Granger test as for our own. The difference was that many of the most significant hypotheses found by the Granger test were erroneous. The PC algorithm, unlike the other two, returns both directed and undirected edges. In computing the FDR and FNR we omitted the undirected edges, in order to better compare the three methods (the graphs below show all edges, including undirected and bidirected ones). Most of the discoveries by the PC algorithm were in fact false.

Next, we look at the false non-discovery rate (FNR, also called the false negative rate). It may be that an algorithm makes few false discoveries, but at the expense of missing many opportunities for discovery (false negatives). The FNR is proportion of false negatives as a proportion of all negative results (here that means the proportion of relationships falsely called non-causal as a proportion of all relationships called non-causal). Due to the large number of hypotheses tested, even though some algorithms missed many edges, the proportion still looks quite low.

Finally, since data was generated twice for each combination of parameters (structure and noise level), we looked at how many of the relationships were found in both runs. The intersection is the number of relationships that were found in both runs, as a proportion of all relationships found. That is, with 8 pairs of data sets (ignoring pattern 1, for which we only tested one low and high noise level), we sum the number of common relationships in each pair and divide by the sum of all relationships found. Our method found nearly the same relationships (~95%) in both runs for a particular parameter value, while the Granger test had fewer commonalities at ~75% and the PC algorithm generally found completely different sets of relationships (~7% intersection).

Algorithm FDR FNR Intersection
Temporal Logic 0.0093 0.0005 0.9583
Granger 0.5079 0.0026 0.7530
PC 0.9608 0.0159 0.0671

Detailed results

Below we show graphs of each embedded structure (labeled "True structure") and the graphs returned by the three methods. For our testing as well as the Granger tests, we used the same fdr procedure in order to determine which relationships would be deemed significant (i.e. those in the graphs). The full set of histograms analogous to that in the paper are shown here.

The interpretation of each is as follows. The PC graph denotes "neuron A causes neuron B to fire" with a directed edge from A to B. There are also undirected edges, meaning that there is a relationship between A and B but its direction is unknown. The Granger graphs contain only directed arrows, denoting that the neuron at the tail causes the neuron at the head to fire in exactly 20 time units. In our graphs, (labeled "Temporal logic"), the arrows mean that the neuron at the tail causes the neuron at the head to fire in greater than or equal to 20 and less than or equal to 40 time units after the tail neuron fires.

In all graphs there are three types of edges. Since there were two data sets generated for each set of parameters (structure and noise level), we combined the results into a single graph for each method and parameter. Solid edges denote relationships found in both of these "runs", pink dotted edges denote relationships found in the first run but not the second and green dotted edges denote relationships found in the second run but not the first.

Results for each structure

One (scatter gather), Two (two separate scatter gather graphs), Three (a chain of scatter gather relationships), Four (a binary tree), and Five (a long chain of neurons).

Graph legend

Pattern 1

 

True structure:

Embedded structure one

Results, low noise datasets:

Tetrad run on pattern one with low noise

PC


Granger test run on pattern one with low noise

Granger


Our test run on pattern one with low noise

Temporal logic

 

 

Results, high noise datasets:

Tetrad run on pattern one with high noise

PC


Granger test run on pattern one with high noise

Granger


Our test run on pattern one with high noise

Temporal logic

 

Pattern 2

 

True structure:

Embedded structure two

Results, low noise datasets:

Tetrad run on pattern two with low noise

PC


Granger test run on pattern two with low noise

Granger


Our test run on pattern two with low noise

Temporal logic

 

 

Results, high noise datasets:

Tetrad run on pattern two with high noise

PC


Granger test run on pattern two with high noise

Granger


Our test run on pattern two with high noise

Temporal logic

 

Pattern 3

 

True structure:

Embedded structure three

Results, low noise datasets:

Tetrad run on pattern three with low noise

PC


Granger test run on pattern three with low noise

Granger


Our test run on pattern three with low noise

Temporal logic

 

 

Results, high noise datasets:

Tetrad run on pattern three with high noise

PC


Granger test run on pattern three with high noise

Granger


Our test run on pattern three with high noise

Temporal logic

 

Pattern 4

 

True structure:

Embedded structure four

Results, low noise datasets:

Tetrad run on pattern four with low noise

PC


Granger test run on pattern four with low noise

Granger


Our test run on pattern four with low noise

Temporal logic

 

 

Results, high noise datasets:

Tetrad run on pattern four with high noise

PC


Granger test run on pattern four with high noise

Granger


Our test run on pattern four with high noise

Temporal logic

 

Pattern 5

True structure:

Embedded structure five
 

Results, low noise datasets:

Tetrad run on pattern five with low noise

PC


Granger test run on pattern five with low noise

Granger


Our test run on pattern five with low noise

Temporal logic

 

 

Results, high noise datasets:

Tetrad run on pattern five with high noise

PC


Granger test run on pattern five with high noise

Granger


Our test run on pattern five with high noise

Temporal logic