-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
116 additions
and
110 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
figure.autolayout : True | ||
|
||
axes.titlesize : 20 | ||
axes.labelsize : 15 ## fontsize of the x any y labels | ||
|
||
font.size : 15 | ||
|
||
xtick.labelsize : 15 ## fontsize of the tick labels | ||
ytick.labelsize : 15 ## fontsize of the tick labels | ||
|
||
axes.spines.top : False | ||
axes.spines.right : False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,13 @@ | ||
from sklearn.tree import DecisionTreeClassifier | ||
from sklearn.linear_model import RidgeClassifier | ||
from sklearn.linear_model import Logistic | ||
|
||
# This is where we would implement our custom model | ||
# This is where we would implement our custom model | ||
|
||
def get_model(args): | ||
if args.model_name == 'decision_tree': | ||
model = DecisionTreeClassifier(max_depth=args.max_depth) | ||
elif args.model_name == 'ridge': | ||
model = RidgeClassifier(alpha=args.alpha) | ||
else: | ||
raise ValueError('Invalid model_name: {}'.format(args.model_name)) | ||
return model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,33 @@ | ||
This is an evolving repo optimized for machine-learning projects aimed at designing a new algorithm. They require sweeping over different hyperparameters, comparing to baselines, and iteratively refining an algorithm. | ||
This is an evolving repo optimized for machine-learning projects aimed at designing a new algorithm. They require sweeping over different hyperparameters, comparing to baselines, and iteratively refining an algorithm. Based of [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science). | ||
|
||
# Organization | ||
- `project_name`: to be renamed, contains main code for modeling (e.g. model architecture) | ||
- `experiments`: contains code for runnning experiments (e.g. loading data, training models, evaluating models) | ||
- `scripts`: contains scripts for running experiments (e.g. python scripts that launch jobs in `experiments` folder with different hyperparams) | ||
- `notebooks`: contains jupyter notebooks for analyzing results, errors, and making figures | ||
- `project_name`: should be renamed, contains main code for modeling (e.g. model architecture) | ||
- `experiments`: code for runnning experiments (e.g. loading data, training models, evaluating models) | ||
- `scripts`: scripts for running experiments (e.g. python scripts that launch jobs in `experiments` folder with different hyperparams) | ||
- `notebooks`: jupyter notebooks for analyzing results and making figures | ||
|
||
# Setup | ||
- first, rename `project_name` to your project name and modify `setup.py` accordingly | ||
- clone and run `pip install -e .`, resulting in a package named `project_name` that can be imported | ||
- first, rename `project_name` to your project name and modify `setup.py` accordingly | ||
- example run: run `python scripts/01_train_models.py` then load the results in `notebooks/01_model_results.ipynb` | ||
- example run: run `python scripts/01_train_models.py` (which calls `experiments/01_train_model.py` then view the results in `notebooks/01_model_results.ipynb` | ||
|
||
# Features | ||
- scripts sweep over hyperparameters using easy-to-specify python code | ||
- experiments automatically cache runs that have already completed | ||
- caching uses the (**non-default**) arguments in the argparse namespace | ||
- notebooks can easily evaluate results aggregated over multiple experiments using pandas | ||
- binary arguments should start with the word "use" (e.g. `--use_caching`) and take values 0 or 1 | ||
|
||
# Guidelines | ||
- Huggingface whenever possible, then pytorch | ||
- See some useful packages [here](https://csinva.io/blog/misc/ml_coding_tips) | ||
- Avoid notebooks whenever possible (ideally, only for analyzing results, making figures) | ||
- Paths should be specified relative to a file's location (e.g. `os.path.join(os.path.dirname(__file__), 'data')`) | ||
- Naming variables: use the main thing first followed by the modifiers (e.g. `X_train`, `acc_test`) | ||
- binary arguments should start with the word "use" (e.g. `--use_caching`) and take values 0 or 1 | ||
- Use logging instead of print | ||
- Use argparse and sweep over hyperparams using python scripts (or [amulet](https://amulet-docs.azurewebsites.net/main/index.html)) | ||
- Note, arguments get passed as strings so shouldn't pass args that aren't primitives or a list of primitives (more complex structures should be handled in the experiments code) | ||
- Each run should save a single pickle file of its results | ||
- Everything should run end-to-end with one script (caching things along the way) | ||
- All experiments that depend on each other should run end-to-end with one script (caching things along the way) | ||
- Keep an updated requirements.txt (required for amulet) | ||
- Follow sklearn apis whenever possible | ||
- Follow sklearn apis whenever possible | ||
- Use Huggingface whenever possible, then pytorch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.