Skip to content

Workflow module

ConditionalDependence

Class containing methods for calculating conditional dependence.

Source code in xicorpy/conditional_dependence.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class ConditionalDependence:
    """Class containing methods for calculating conditional dependence."""

    def __init__(self, y: npt.ArrayLike, z: npt.ArrayLike):
        """

        Initialize and Validate a ConditionalDependence object.

        You can then pass any X values to `compute_conditional_dependence` and `compute_conditional_dependence_1d`

        Args:
            y (npt.ArrayLike): A single list or 1D array or a pandas Series.
            z (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

        Raises:
            ValueError: If y is not 1d.
            ValueError: If z is not 1d or 2d.
            ValueError: If y and z have different lengths.
            ValueError: If there are <= 2 valid y values.
        """
        self.y_, self.z_df = validate_and_prepare_for_conditional_dependence(y, z)
        self.y = y
        self.z = z

    def _validate_x(self, x: npt.ArrayLike):
        if not 1 <= np.ndim(x) <= 2:
            raise ValueError("x must be a 1D or 2D array")

        x_shape = np.shape(x)
        y_shape = np.shape(self.y)
        if x_shape[0] != y_shape[0]:
            raise ValueError("x must have the same number of samples as y")

    def compute_conditional_dependence(self, x: npt.ArrayLike = None):
        """
        Compute conditional dependence coefficient based on:
            [Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics](https://arxiv.org/abs/1910.12327)

        If X is passed, computes `T(Y, Z|X)` where `T` is the conditional dependence coefficient. Otherwise, computes `T(Y, Z)`.

        Conditional Dependence Coefficient lies between 0 and 1, and is

            0 if Y is completely independent of Z|X
            1 if Y is a measurable function of Z|X

        Args:
            x (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

        Returns:
            float: Conditional Dependence Coefficient.

        Raises:
            ValueError: If x is passed, and not same number of rows as y.

        """
        if x is None:
            return _conditional_dependence_no_x(self.y_, self.z_df)
        else:
            self._validate_x(x)
            x_ = convert_to_numeric(pd.DataFrame(x)).loc[self.y_.index]
            return _conditional_dependence_with_x(self.y_, self.z_df, x_)

    def compute_conditional_dependence_1d(self, x: npt.ArrayLike = None):
        """
        Computes conditional dependence of y on **each column** of z individually.

        Use when you want to compute `T(Y, Z_j|X)` for each column of Z.

        Args:
            x (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

        Returns:
            dict: Keys are column names (or indices if x is not a pandas object), and values are conditional dependence coefficients.

        Raises:
            ValueError: If x is passed, and does not have same number of rows as y.

        """
        if x is None:
            return _conditional_dependence_each_z_no_x(self.y_, self.z_df)
        else:
            self._validate_x(x)
            x_ = convert_to_numeric(pd.DataFrame(x)).loc[self.y_.index]
            return _conditional_dependence_with_each_z(self.y_, self.z_df, x_)

__init__(y, z)

Initialize and Validate a ConditionalDependence object.

You can then pass any X values to compute_conditional_dependence and compute_conditional_dependence_1d

Parameters:

Name Type Description Default
y npt.ArrayLike

A single list or 1D array or a pandas Series.

required
z npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required

Raises:

Type Description
ValueError

If y is not 1d.

ValueError

If z is not 1d or 2d.

ValueError

If y and z have different lengths.

ValueError

If there are <= 2 valid y values.

Source code in xicorpy/conditional_dependence.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def __init__(self, y: npt.ArrayLike, z: npt.ArrayLike):
    """

    Initialize and Validate a ConditionalDependence object.

    You can then pass any X values to `compute_conditional_dependence` and `compute_conditional_dependence_1d`

    Args:
        y (npt.ArrayLike): A single list or 1D array or a pandas Series.
        z (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

    Raises:
        ValueError: If y is not 1d.
        ValueError: If z is not 1d or 2d.
        ValueError: If y and z have different lengths.
        ValueError: If there are <= 2 valid y values.
    """
    self.y_, self.z_df = validate_and_prepare_for_conditional_dependence(y, z)
    self.y = y
    self.z = z

compute_conditional_dependence(x=None)

Compute conditional dependence coefficient based on

Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics

If X is passed, computes T(Y, Z|X) where T is the conditional dependence coefficient. Otherwise, computes T(Y, Z).

Conditional Dependence Coefficient lies between 0 and 1, and is

0 if Y is completely independent of Z|X
1 if Y is a measurable function of Z|X

Parameters:

Name Type Description Default
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

None

Returns:

Name Type Description
float

Conditional Dependence Coefficient.

Raises:

Type Description
ValueError

If x is passed, and not same number of rows as y.

Source code in xicorpy/conditional_dependence.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def compute_conditional_dependence(self, x: npt.ArrayLike = None):
    """
    Compute conditional dependence coefficient based on:
        [Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics](https://arxiv.org/abs/1910.12327)

    If X is passed, computes `T(Y, Z|X)` where `T` is the conditional dependence coefficient. Otherwise, computes `T(Y, Z)`.

    Conditional Dependence Coefficient lies between 0 and 1, and is

        0 if Y is completely independent of Z|X
        1 if Y is a measurable function of Z|X

    Args:
        x (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

    Returns:
        float: Conditional Dependence Coefficient.

    Raises:
        ValueError: If x is passed, and not same number of rows as y.

    """
    if x is None:
        return _conditional_dependence_no_x(self.y_, self.z_df)
    else:
        self._validate_x(x)
        x_ = convert_to_numeric(pd.DataFrame(x)).loc[self.y_.index]
        return _conditional_dependence_with_x(self.y_, self.z_df, x_)

compute_conditional_dependence_1d(x=None)

Computes conditional dependence of y on each column of z individually.

Use when you want to compute T(Y, Z_j|X) for each column of Z.

Parameters:

Name Type Description Default
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

None

Returns:

Name Type Description
dict

Keys are column names (or indices if x is not a pandas object), and values are conditional dependence coefficients.

Raises:

Type Description
ValueError

If x is passed, and does not have same number of rows as y.

Source code in xicorpy/conditional_dependence.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def compute_conditional_dependence_1d(self, x: npt.ArrayLike = None):
    """
    Computes conditional dependence of y on **each column** of z individually.

    Use when you want to compute `T(Y, Z_j|X)` for each column of Z.

    Args:
        x (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

    Returns:
        dict: Keys are column names (or indices if x is not a pandas object), and values are conditional dependence coefficients.

    Raises:
        ValueError: If x is passed, and does not have same number of rows as y.

    """
    if x is None:
        return _conditional_dependence_each_z_no_x(self.y_, self.z_df)
    else:
        self._validate_x(x)
        x_ = convert_to_numeric(pd.DataFrame(x)).loc[self.y_.index]
        return _conditional_dependence_with_each_z(self.y_, self.z_df, x_)

compute_conditional_dependence(y, z, x=None)

Compute conditional dependence coefficient based on

Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics

If X is passed, computes T(Y, Z|X) where T is the conditional dependence coefficient. Otherwise, computes T(Y, Z).

Conditional Dependence Coefficient lies between 0 and 1, and is

0 if Y is completely independent of Z|X
1 if Y is a measurable function of Z|X

Parameters:

Name Type Description Default
y npt.ArrayLike

A single list or 1D array or a pandas Series.

required
z npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

None

Returns:

Name Type Description
float float

Conditional Dependence Coefficient.

Raises:

Type Description
ValueError

If y is not 1d.

ValueError

If z is not 1d or 2d.

ValueError

If y and z have different lengths.

ValueError

If there are <= 2 valid y values.

ValueError

If x is passed, and not same number of rows as y.

Source code in xicorpy/conditional_dependence.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def compute_conditional_dependence(
    y: npt.ArrayLike, z: npt.ArrayLike, x: npt.ArrayLike = None
) -> float:
    """
    Compute conditional dependence coefficient based on:
        [Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics](https://arxiv.org/abs/1910.12327)

    If X is passed, computes `T(Y, Z|X)` where `T` is the conditional dependence coefficient. Otherwise, computes `T(Y, Z)`.

    Conditional Dependence Coefficient lies between 0 and 1, and is

        0 if Y is completely independent of Z|X
        1 if Y is a measurable function of Z|X

    Args:
        y (npt.ArrayLike): A single list or 1D array or a pandas Series.
        z (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.
        x (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

    Returns:
        float: Conditional Dependence Coefficient.

    Raises:
        ValueError: If y is not 1d.
        ValueError: If z is not 1d or 2d.
        ValueError: If y and z have different lengths.
        ValueError: If there are <= 2 valid y values.
        ValueError: If x is passed, and not same number of rows as y.

    """
    return ConditionalDependence(y, z).compute_conditional_dependence(x)

compute_conditional_dependence_1d(y, z, x=None)

Computes conditional dependence of y on each column of z individually.

Use when you want to compute T(Y, Z_j|X) for each column of Z.

Parameters:

Name Type Description Default
y npt.ArrayLike

A single list or 1D array or a pandas Series.

required
z npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

None

Returns:

Name Type Description
dict Dict[Union[str, int], float]

Keys are column names (or indices if x is not a pandas object), and values are conditional dependence coefficients.

Raises:

Type Description
ValueError

If y is not 1d.

ValueError

If z is not 1d or 2d.

ValueError

If y and z have different lengths.

ValueError

If there are <= 2 valid y values.

ValueError

If x is passed, and does not have the same number of rows as y.

Source code in xicorpy/conditional_dependence.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def compute_conditional_dependence_1d(
    y: npt.ArrayLike, z: npt.ArrayLike, x: npt.ArrayLike = None
) -> Dict[Union[str, int], float]:
    """
    Computes conditional dependence of y on **each column** of z individually.

    Use when you want to compute `T(Y, Z_j|X)` for each column of Z.

    Args:
        y (npt.ArrayLike): A single list or 1D array or a pandas Series.
        z (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.
        x (npt.ArrayLike): A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

    Returns:
        dict: Keys are column names (or indices if x is not a pandas object), and values are conditional dependence coefficients.

    Raises:
        ValueError: If y is not 1d.
        ValueError: If z is not 1d or 2d.
        ValueError: If y and z have different lengths.
        ValueError: If there are <= 2 valid y values.
        ValueError: If x is passed, and does not have the same number of rows as y.

    """
    return ConditionalDependence(y, z).compute_conditional_dependence_1d(x)