Workflow module
XiCorrelation
Class containing Xi Correlation computation components
Source code in xicorpy/correlation.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
|
__init__(x, y=None)
If only x
is passed, computes correlation between each column of x
.
If y
is also passed, computes correlation between each column of x
vs each column of y
.
If only x
is passed, x
MUST be 2-d. Otherwise, both x
and y
can be 1-d
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
required |
y |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If x and y are not of the same shape. |
ValueError
|
If there's less than 2 columns to compute correlation. |
Source code in xicorpy/correlation.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
compute_xi(get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)
Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.
Xi Coefficient based on
Modified Xi Coefficient based on
Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"
The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
get_modified_xi |
bool
|
Should the modified xi be computed? Defaults to True when there are no ties, and False when ties are present. |
None
|
m_nearest_neighbours |
int
|
Only used when get_modified_xi is True. Defaults to square-root of array size. |
None
|
get_p_values |
bool
|
Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0). |
False
|
Returns:
Type | Description |
---|---|
Union[_RetType, Tuple[_RetType, _RetType]]
|
float/np.ndarray/pd.DataFrame: |
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
Source code in xicorpy/correlation.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
|
compute_xi_correlation(x, y=None, get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)
Helper function to compute the Xi Coefficient - uses the class machinery from XiCorrelation
.
Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.
Xi Coefficient based on
Modified Xi Coefficient based on
Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"
The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.
If only X is passed, computes correlation between each column of X. If Y is also passed, computes correlation between each column of X vs each column of Y.
If only X is passed, X MUST be 2-d. Otherwise, both X and Y can be 1-d
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
required |
y |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
None
|
get_modified_xi |
bool
|
Should the modified xi be computed? By default this is True when there are no ties and False when ties are present |
None
|
m_nearest_neighbours |
int
|
Only used if get_modified_xi is True. |
None
|
get_p_values |
bool
|
Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0). |
False
|
Returns:
Type | Description |
---|---|
Union[_RetType, Tuple[_RetType, _RetType]]
|
float/np.ndarray/pd.DataFrame: |
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
Source code in xicorpy/correlation.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|