Running Tensorflow with Docker on GCP

Provision Virtual Machine

$ gcloud auth login

$ gcloud config set project machine-learning-000000 # Your project id

$ gcloud beta compute \
addresses create mlvm \
--region=us-east1 \
--network-tier=PREMIUM

$ MLVM_IP="$(gcloud beta compute \
addresses describe mlvm \
--region=us-east1 \
| head -n1 | awk '{print $2}')"

$ gcloud beta compute \
instances create mlvm \
--zone=us-east1-b \
--machine-type=n1-standard-2 \
--subnet=default \
--network-tier=PREMIUM \
--address="$MLVM_IP" \
--maintenance-policy=TERMINATE \
--no-service-account \
--no-scopes \
--accelerator=type=nvidia-tesla-p100,count=1 \
--image=centos-7-v20181011 \
--image-project=centos-cloud \
--boot-disk-size=40GB \
--boot-disk-type=pd-standard \
--boot-disk-device-name=mlvm

Configure Virtual Machine

$ gcloud beta compute ssh user@mlvm

$ sudo su

$ cd ~/

$ curl https://download.docker.com/linux/centos/docker-ce.repo \
> /etc/yum.repos.d/docker-ce.repo

$ curl https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo \
> /etc/yum.repos.d/nvidia-docker.repo

$ yum install --assumeyes \
"@Development Tools" \
"kernel-devel-$(uname -r)" \
"kernel-headers-$(uname -r)" \
"docker-ce-18.06.1" \
"nvidia-docker2-2.0.3"

$ curl https://us.download.nvidia.com/tesla/396.44/NVIDIA-Linux-x86_64-396.44.run \
> NVIDIA-Linux-x86_64-396.44.run

$ sh NVIDIA-Linux-x86_64-396.44.run --silent

$ systemctl enable docker

$ systemctl start docker

$ docker run \
--runtime=nvidia \
-it \
--rm \
tensorflow/tensorflow:1.11.0-devel-gpu \
python -c "import tensorflow as tf; print(tf.contrib.eager.num_gpus())"

😄🙌🎉…🔥💰

Destroy Virtual Machine

$ exit # Exit from 'sudo su'

$ exit # Exit from 'gcloud beta compute ssh user@mlvm'

$ gcloud beta compute \
addresses delete mlvm \
--region=us-east1

$ gcloud beta compute \
instances delete mlvm \
--zone=us-east1-b

Benchmarking Ruby with GCC (4.4, 4.7, 4.8, 4.9) and Clang (3.2, 3.3, 3.4, 3.5)

This post is partially inspired by Braulio Bhavamitra’s comments about Ruby being faster when compiled with Clang rather than GCC and partially by Brendan Gregg’s comments about compiler optimisation during his Flame Graphs talk at USENIX LISA13 (0:33:30).

In short I wanted to look at what kind of performance we are leaving on the table by not taking advantage of 1) The newest compiler versions & 2) The most aggressive compiler optimizations. This is especially perniant to those of us deploying applications on PaaS infrastructure where we often have zero control over such things. Does the cost-benefit analysis still work out the same when you take into account a 10/20/30% performance hit?

All tests were run on AWS from an m3.medium EC2 instance and the AMI used was a modified copy of one of my weekly generated Gentoo Linux AMIs. The version of Ruby was 2.1 while the tests themselves are from Antonio Cangiano’s Ruby Benchmark Suite. The tooling used to run them is available on my GitHub if you want to try this out for yourself.

The full test suite was run for each of the following compiler variants, O3 was not used with Clang since it only adds a single additional flag:

  • GCC 4.4 with O2 – Ships with Ubuntu 10.04 (Lucid) & RHEL/CentOS 6
  • GCC 4.4 with O3
  • GCC 4.7 with O2 – Ships with Debian 7 (Wheezy) & Ubuntu 12.04 (Precise)
  • GCC 4.7 with O3
  • GCC 4.8 with O2 – Ships with Ubuntu 14.04 (Trusty) & RHEL/CentOS 7
  • GCC 4.8 with O3
  • GCC 4.9 with O2 – Ships with Debian 8 (Jessie)
  • GCC 4.9 with O3
  • Clang 3.2 with O2
  • Clang 3.3 with O2
  • Clang 3.4 with O2
  • Clang 3.5 with O2

Each variant was then given a number of points per test based on it’s ranking, 0 points to the variant which performed the best, 1 to the second best, and so on until 11 points were given to the variant which performed the worst.

These scores were then added up per variant and plotted onto a bar graph to try and visualize performance per variant.

From this we can determine that:

  1. Your choice of compiler does have a non-negligible affect on the performance of your runtime environment.
  2. Modern versions of GCC (4.7 & 4.8) and Clang (3.2 & 3.3) have very similar performance.
  3. Clang 3.4 seems to suffer from some performance regressions in this context.
  4. The latest version of GCC (4.9) is ahead by a clear margin.
  5. All O3 variants expect GCC 4.8 performed worse than their O2 counterparts. This is not that unusual and very often using O3 will degrade performance or
    even break an application all together. However the default Makefile shipped with Ruby 1.9.3 and above uses O3, which appears to hurt performance.

Of course the standard disclaimers apply. Benchmarking correctly is hard, you may not see the same results in your specific environment, do not immediately recompile everything in prod using GCC 4.9, etc.

Update:

Lots of people asked to see the raw data plotted as well as the relative performance, so here it is. For each test the average score for all varients was calculated as this was named as the baseline and marked as 0. Then for each test/varient a percentage was calculated showing how much faster/slower it was than the baseline.

For example on test eight GCC 4.9 O2 was 7% faster than the baseline while Clang 3.5 was 2% faster than the baseline. From this we can infer that GCC 4.9 O2 was 5% faster than Clang 3.5 in that test.

Since this makes the graph very cluttered it is best that you only select a few variants at once, you can also pan and zoom.

Listing EC2 instances in all regions

When working with EC2 instances across multiple regions I’ve found it’s near impossible to get a good overview of what is running where. This can be especially annoying when you are automatically launching a number of short lived instances.

To prevent me having to go through 9 different web pages to see what I currently have running I found it easier to just use the API and list active instances from the CLI.

Install dependencies:

$ gem install aws-sdk pmap

/usr/local/bin/aws-list:

#!/usr/bin/ruby

require 'aws-sdk'
require 'pmap'

def ec2(region = 'us-east-1')
ec2 = AWS::EC2.new(
access_key_id: ENV['AWS_ACCESS_KEY'],
secret_access_key: ENV['AWS_SECRET_KEY'],
region: region
)
ec2
end

def list_instances
instances = []
ec2.regions.peach do |region|
ec2.regions[region.name].instances.peach do |instance|
next if instance.status == :terminated
instances << instance
end
end
instances
end

list_instances.peach do |instance|
puts "#{instance.id}\t\t#{instance.availability_zone}\t\t#{instance.status}\t\t#{instance.ip_address}\n"
end

Listing instances:

$ export AWS_ACCESS_KEY="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
$ export AWS_SECRET_KEY="ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
$ aws-list

i-16b78754 eu-west-1a running 54.77.218.113
i-0025e1e6 eu-west-1a running 54.76.129.127
i-3926e2df eu-west-1a running 54.154.52.146
i-4924e0af eu-west-1a running 54.154.52.77
i-c424e022 eu-west-1a running 54.72.131.127
i-0c25e1ea eu-west-1a running 54.154.51.140
i-9c25e17a eu-west-1a running 54.154.49.204
i-4b24e0ad eu-west-1a running 54.77.225.135
i-33e929f2 eu-central-1b running 54.93.164.233
i-c324e025 eu-west-1a running 54.76.98.165
i-3f26e2d9 eu-west-1a running 54.154.47.126
i-8027e366 eu-west-1a running 54.154.20.140
i-0d25e1eb eu-west-1a running 54.77.100.132
i-d718edd9 us-west-2c running 54.149.35.63
i-0c2028e6 us-east-1a running 54.164.193.104
i-5b95e54e sa-east-1a running 54.94.165.7
i-2dad38de ap-northeast-1a running 54.65.157.129
i-625a80af ap-southeast-1a running 54.169.195.201
i-a06e006f ap-southeast-2a running 54.66.184.34
i-2dbce5e5 us-west-1a running 54.67.67.18