Skip to content

bloomberg/interval_targets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Learning with Interval Targets

This guide explains how to set up the environment and run experiments for learning with interval-based targets.

Setup and installation

  1. Create a virtual environment (recommended)

    python -m venv venv
    source venv/bin/activate
  2. Install Dependencies

    Install the required packages using the requirements.txt file:

    pip install -r requirements.txt

How to run experiments

Single experiment

Run main.py from your terminal with your chosen arguments. The script automatically performs a grid search for any argument that is given multiple values.

Example: This command tests two learning rates on the abalone dataset.

python main.py \
    --data_names "abalone" \
    --method "projection" \
    --lr 0.01 0.001 \
    --num_epochs 100 \
    --name "abalone_test_run"

The results will be saved to abalone_test_run.csv.

Available methods

The --method argument controls the core training strategy.

Direct interval loss methods

  • projection: The model is only penalized if its prediction falls outside the target interval. If the prediction is inside, the loss is zero.
  • minmax: The loss is calculated against the worst-case label within the interval. It can be shown that this worst-case label will always be one of the interval's two endpoints.

Pseudo-labeling (PL) methods

These methods use a two-stage process:

  1. Generate Pseudo-Labels: First, multiple models are trained using the projection loss. Their predictions on the training set become the "pseudo-labels." This approach incorporates the smoothness of the model class into the process of defining the target region.

  2. Train Final Model: A final model is then trained on these pseudo-labels using one of the following aggregation strategies:

    • minmax_pl_mean: Minimizes the average loss calculated against all pseudo-labels.
    • minmax_pl_max: Minimizes the maximum loss calculated against any of the pseudo-labels.

Using the LipMLP model

To use a Lipschitz-constrained MLP (LipMLP) instead of a standard MLP, add the --use_lip_mlp flag, specifying the method it should apply to. This flag is currently supported for projection and all PL methods (but not minmax).

Example: The following command runs the projection loss with a LipMLP.

python main.py \
    --data_names "abalone" \
    --method "projection" \
    --lr 0.01 0.001 \
    --num_epochs 100 \
    --use_lip_mlp "projection" \
    --name "abalone_lipmlp_run"

Adding New Datasets

Option 1: The .arff Method

For a dataset named my_data, place the file at dataset/my_data/my_data.arff. The script will find it automatically, assuming the target label is the first column. For example, you could download a wine_quality dataset.

Option 2: The DATASET_CONFIGS

This is the best way to handle CSV files. To add a new dataset, add a new entry to the DATASET_CONFIGS dictionary. For guidance, you can refer to the existing configuration for the abalone dataset. Download the abalone dataset and place the abalone folder inside your main dataset folder.

Example Entry:

# In main.py inside the DATASET_CONFIGS dictionary
"new_dataset": {
    "path": "dataset/new_dataset/data.csv",
    "loader_func": pd.read_csv,
    "target_col": "name_of_target_column",
    "drop_cols": ["id_column_to_drop"],
}

You can now use --data_names "new_dataset" in your commands.

About

Learning with interval targets

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages