Skip to content

FOCI module

FOCI

Class for computing FOCI.

__init__(y, x)

Initialize and validate the FOCI object.

You can then use the select_features method to select features.

Parameters:

Name Type Description Default
y npt.ArrayLike

A single list or 1D array or a pandas Series.

required
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required

Raises:

Type Description
ValueError

If y is not 1d.

ValueError

If x is not 1d or 2d.

ValueError

If y and x have different lengths.

ValueError

If there are <= 2 valid y values.

select_features(num_features=None, init_selection=None, get_conditional_dependency=False)

Selects features based on the Feature Ordering based on Conditional Independence (FOCI) algorithm in: Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics

Parameters:

Name Type Description Default
num_features int

Maximum number of features to select. Defaults to the number of features in x.

None
init_selection list

Initial selection of features.

None
get_conditional_dependency bool

If True, returns conditional dependency. Defaults to False

False

Returns:

Name Type Description
list Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]

List of selected features. If x was pd.DataFrame, this will be column names. Otherwise, this will be indices.

list Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]

Conditional Dependency measure as each feature got selected Only when get_conditional_dependency is True

select_features_using_foci(y, x, num_features=None, init_selection=None, get_conditional_dependency=False)

Implements the FOCI algorithm for feature selection.

Azadkia and Chatterjee (2021). "A simple measure of conditional dependence", Annals of Statistics. https://arxiv.org/abs/1910.12327.

Parameters:

Name Type Description Default
y npt.ArrayLike

The dependent variable. A single list or 1D array or a pandas Series.

required
x npt.ArrayLike

The independent variables. A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required
num_features int

Max number of features to select. Defaults to ALL features.

None
init_selection list

Initial selection of features. If x is a pd.DataFrame, this is expected to be a list of column names. Otherwise, this is expected to be a list of indices.

None
get_conditional_dependency bool

If True, returns conditional dependency

False

Returns:

Name Type Description
list Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]

List of selected features. If x was pd.DataFrame, this will be column names. Otherwise, this will be indices.

list Union[List[Union[int, str]], Tuple[List[Union[int, str]], List[float]]]

Conditional Dependency measure as each feature got selected Only when get_conditional_dependency is True

Raises:

Type Description
ValueError

If y is not 1d.

ValueError

If x is not 1d or 2d.

ValueError

If y and x have different lengths.

ValueError

If there are <= 2 valid y values.