Learning with Interval Targets

This guide explains how to set up the environment and run experiments for learning with interval-based targets.

Setup and installation

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate

Install Dependencies

Install the required packages using the requirements.txt file:
```
pip install -r requirements.txt
```

How to run experiments

Single experiment

Run main.py from your terminal with your chosen arguments. The script automatically performs a grid search for any argument that is given multiple values.

Example: This command tests two learning rates on the abalone dataset.

python main.py \
    --data_names "abalone" \
    --method "projection" \
    --lr 0.01 0.001 \
    --num_epochs 100 \
    --name "abalone_test_run"

The results will be saved to abalone_test_run.csv.

Available methods

The --method argument controls the core training strategy.

Direct interval loss methods

projection: The model is only penalized if its prediction falls outside the target interval. If the prediction is inside, the loss is zero.
minmax: The loss is calculated against the worst-case label within the interval. It can be shown that this worst-case label will always be one of the interval's two endpoints.

Pseudo-labeling (PL) methods

These methods use a two-stage process:

Generate Pseudo-Labels: First, multiple models are trained using the projection loss. Their predictions on the training set become the "pseudo-labels." This approach incorporates the smoothness of the model class into the process of defining the target region.
Train Final Model: A final model is then trained on these pseudo-labels using one of the following aggregation strategies:
- minmax_pl_mean: Minimizes the average loss calculated against all pseudo-labels.
- minmax_pl_max: Minimizes the maximum loss calculated against any of the pseudo-labels.

Using the LipMLP model

To use a Lipschitz-constrained MLP (LipMLP) instead of a standard MLP, add the --use_lip_mlp flag, specifying the method it should apply to. This flag is currently supported for projection and all PL methods (but not minmax).

Example: The following command runs the projection loss with a LipMLP.

python main.py \
    --data_names "abalone" \
    --method "projection" \
    --lr 0.01 0.001 \
    --num_epochs 100 \
    --use_lip_mlp "projection" \
    --name "abalone_lipmlp_run"

Adding New Datasets

Option 1: The `.arff` Method

For a dataset named my_data, place the file at dataset/my_data/my_data.arff. The script will find it automatically, assuming the target label is the first column. For example, you could download a wine_quality dataset.

Option 2: The `DATASET_CONFIGS`

This is the best way to handle CSV files. To add a new dataset, add a new entry to the DATASET_CONFIGS dictionary. For guidance, you can refer to the existing configuration for the abalone dataset. Download the abalone dataset and place the abalone folder inside your main dataset folder.

Example Entry:

# In main.py inside the DATASET_CONFIGS dictionary
"new_dataset": {
    "path": "dataset/new_dataset/data.csv",
    "loader_func": pd.read_csv,
    "target_col": "name_of_target_column",
    "drop_cols": ["id_column_to_drop"],
}

You can now use --data_names "new_dataset" in your commands.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning with Interval Targets

Setup and installation

How to run experiments

Single experiment

Available methods

Direct interval loss methods

Pseudo-labeling (PL) methods

Using the LipMLP model

Adding New Datasets

Option 1: The `.arff` Method

Option 2: The `DATASET_CONFIGS`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

bloomberg/interval_targets

Folders and files

Latest commit

History

Repository files navigation

Learning with Interval Targets

Setup and installation

How to run experiments

Single experiment

Available methods

Direct interval loss methods

Pseudo-labeling (PL) methods

Using the LipMLP model

Adding New Datasets

Option 1: The .arff Method

Option 2: The DATASET_CONFIGS

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Option 1: The `.arff` Method

Option 2: The `DATASET_CONFIGS`

Packages