dawn-bench-models/tensorflow/CIFAR10/resnet/README.md
Deepak Narayanan b7e1e0fa0f First commit
2017-08-17 11:43:17 -07:00

2.8 KiB

ResNet on CIFAR10 and CIFAR100

(Borrowed from the tensorflow/models repository)

Dataset

https://www.cs.toronto.edu/~kriz/cifar.html

Setting

  • Pad to 36x36 and random crop. Horizontal flip. Per-image whitening.
  • Momentum optimizer (momentum = 0.9).
  • Learning rate schedule: 0.01 (1 epoch), 0.1 (90 epochs), 0.01 (45 epochs), 0.001 (45 epochs).
  • L2 weight decay: 0.005.
  • Batch size: 128. (28-10 wide and 1001 layer bottleneck use 64)

Results

CIFAR-10 Model Best Precision Steps
32 layer 92.5% ~80k
110 layer 93.6% ~80k
164 layer bottleneck 94.5% ~80k
1001 layer bottleneck 94.9% ~80k
28-10 wide 95% ~90k
CIFAR-100 Model Best Precision Steps
32 layer 68.1% ~45k
110 layer 71.3% ~60k
164 layer bottleneck 75.7% ~50k
1001 layer bottleneck 78.2% ~70k
28-10 wide 78.3% ~70k

Prerequisites

  1. Install TensorFlow 1.2 (preferably from source for higher performance) and Python 3.6.2.

  2. Download CIFAR-10/CIFAR-100 dataset.

curl -o cifar-10-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binary.tar.gz

How to run

# cd to the models repository and run with bash. Expected command output shown.
# The directory should contain an empty WORKSPACE file, the resnet code, and the cifar10 dataset.
# Note: The user can split 5k from train set for eval set.
$ ls -R
.:
cifar10  resnet  WORKSPACE

./cifar10:
data_batch_1.bin  data_batch_2.bin  data_batch_3.bin  data_batch_4.bin
data_batch_5.bin  test_batch.bin

./resnet:
cifar_input.py  README.md  resnet_main.py  resnet_model.py

# Train the model.
$ python3 resnet/resnet_main.py --train_data_path=cifar10/data_batch* \
                                --log_root=/tmp/resnet_model \
                                --train_dir=/tmp/resnet_model/train \
                                --dataset='cifar10' \
                                --num_gpus=1

# While the model is training, you can also check on its progress using tensorboard:
$ tensorboard --logdir=/tmp/resnet_model

# Evaluate the model.
# Avoid running on the same GPU as the training job at the same time,
# otherwise, you might run out of memory.
$ python3 resnet/resnet_main.py --eval_data_path=cifar10/test_batch.bin \
                                --log_root=/tmp/resnet_model \
                                --eval_dir=/tmp/resnet_model/test \
                                --mode=eval \
                                --dataset='cifar10' \
                                --num_gpus=0