FOCI module
FOCI
Class for computing FOCI.
__init__(y, x)
Initialize and validate the FOCI object.
You can then use the select_features
method to select features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
npt.ArrayLike
|
A single list or 1D array or a pandas Series. |
required |
x |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If y is not 1d. |
ValueError
|
If x is not 1d or 2d. |
ValueError
|
If y and x have different lengths. |
ValueError
|
If there are <= 2 valid y values. |
select_features(num_features=None, init_selection=None, get_conditional_dependency=False)
Selects features based on the Feature Ordering based on Conditional Independence (FOCI) algorithm in: Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_features |
int
|
Maximum number of features to select. Defaults to the number of features in x. |
None
|
init_selection |
list
|
Initial selection of features. |
None
|
get_conditional_dependency |
bool
|
If True, returns conditional dependency. Defaults to False |
False
|
Returns:
Name | Type | Description |
---|---|---|
list |
Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]
|
List of selected features.
If x was |
list |
Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]
|
Conditional Dependency measure as each feature got selected Only when get_conditional_dependency is True |
select_features_using_foci(y, x, num_features=None, init_selection=None, get_conditional_dependency=False)
Implements the FOCI algorithm for feature selection.
Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics. https://arxiv.org/abs/1910.12327.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
npt.ArrayLike
|
The dependent variable. A single list or 1D array or a pandas Series. |
required |
x |
npt.ArrayLike
|
The independent variables. A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
required |
num_features |
int
|
Max number of features to select. Defaults to ALL features. |
None
|
init_selection |
list
|
Initial selection of features.
If |
None
|
get_conditional_dependency |
bool
|
If True, returns conditional dependency |
False
|
Returns:
Name | Type | Description |
---|---|---|
list |
Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]
|
List of selected features.
If x was |
list |
Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]
|
Conditional Dependency measure as each feature got selected Only when get_conditional_dependency is True |
Raises:
Type | Description |
---|---|
ValueError
|
If y is not 1d. |
ValueError
|
If x is not 1d or 2d. |
ValueError
|
If y and x have different lengths. |
ValueError
|
If there are <= 2 valid y values. |