dawn-bench-models/pytorch/CIFAR10
2018-01-17 21:16:17 -08:00
..
benchmark Make pytorch evaluate run without optimizer in checkpoint 2018-01-17 20:40:46 -08:00
.gitignore Update pytorch benchmark code with new command line interface 2017-12-11 11:35:48 -08:00
README.md Add simple README for PyTorch commands 2018-01-17 21:16:17 -08:00
setup.py Update pytorch benchmark code with new command line interface 2017-12-11 11:35:48 -08:00

Install

  1. Install PyTorch v0.1.12. If you don't already have it set up, please follow the official install instructions.
  2. Clone this repo and go to this directory
git clone git@github.com:stanford-futuredata/dawn-bench-models.git
cd dawn-bench-models/pytorch/CIFAR10
  1. Install this package
pip install -e .

Quick start

This package adds cifar10 and imagenet command line interfaces. Both include the train subcommands to learn a model from scratch. As an example, here is how to train ResNet164 with preactivation on CIFAR10:

cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 5 -l 0.01
cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 90 -l 0.1 --restore latest
cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 45 -l 0.01 --restore latest
cifar10 train -c last --augmentation --tracking -b 128 --optimizer sgd --arch preact164 -e 45 -l 0.001 --restore latest

The first command creates a new run of ResNet164 with preactivation (--arch preact164) in the ./run/preact164/[TIMESTAMP] directory and starts a warm up of 5 epochs (-e 5) with SGD (--optimizer sgd) and a learning rate of 0.01 (-l 0.01). -c last indicates that we only want to save a checkpoint after the last epoch of the warm up. -b 128 sets the batch size to 128. --augmentation turns on standard data augmentation, i.e. random crop and flip. --tracking saves training and validation results to csv files at ./run/preact164/[TIMESTAMP]/[train|valid]_results.csv

The second command resumes the run from the first command (--restore latest) for another 90 epochs (-e 90) but with a new learning rate (-l 0.1). The third and fourth commands function similarly to the second command, changing the learning rate and running for more epochs.