History

Cody A. Coleman faacf0462f Add simple README for PyTorch commands		2018-01-17 21:16:17 -08:00
..
benchmark	Make pytorch evaluate run without optimizer in checkpoint	2018-01-17 20:40:46 -08:00
.gitignore	Update pytorch benchmark code with new command line interface	2017-12-11 11:35:48 -08:00
README.md	Add simple README for PyTorch commands	2018-01-17 21:16:17 -08:00
setup.py	Update pytorch benchmark code with new command line interface	2017-12-11 11:35:48 -08:00

README.md

Install

Install PyTorch v0.1.12. If you don't already have it set up, please follow the official install instructions.
Clone this repo and go to this directory

git clone git@github.com:stanford-futuredata/dawn-bench-models.git
cd dawn-bench-models/pytorch/CIFAR10

Install this package

pip install -e .

Quick start

This package adds cifar10 and imagenet command line interfaces. Both include the train subcommands to learn a model from scratch. As an example, here is how to train ResNet164 with preactivation on CIFAR10:

cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 5 -l 0.01
cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 90 -l 0.1 --restore latest
cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 45 -l 0.01 --restore latest
cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 45 -l 0.001 --restore latest

The first command creates a new run of ResNet164 with preactivation (--arch preact164) in the ./run/preact164/[TIMESTAMP] directory and starts a warm up of 5 epochs (-e 5) with SGD (--optimizer sgd) and a learning rate of 0.01 (-l 0.01). -c last indicates that we only want to save a checkpoint after the last epoch of the warm up. -b 128 sets the batch size to 128. --augmentation turns on standard data augmentation, i.e. random crop and flip. --tracking saves training and validation results to csv files at ./run/preact164/[TIMESTAMP]/[train|valid]_results.csv

The second command resumes the run from the first command (--restore latest) for another 90 epochs (-e 90) but with a new learning rate (-l 0.1). The third and fourth commands function similarly to the second command, changing the learning rate and running for more epochs.