OpenMediaVault NAS Setup with Docker Applications

My 10 year old NAS finally died after 10 years. It ran Ubuntu, with a bunch of apps and services installed via the OS’s package manager, and some installed and compiled from source. I used the LTS version of Ubuntu for longer term support, and I actually did full OS upgrades once or twice. However, I think I stopped since version 16 or so. I remember after doing full upgrades, certain apps or setup would break (packages removed, dependencies removed or are no longer compatible, etc.).

For my new NAS, I thought about buying a fully built NAS from Synology based on this article from Linux Journal. Although having a pre-built product seems very tempting, I just couldn’t justify the added premium, knowing that I was technical enough to install a Linux OS, that there are distro’s like OpenMediaVault (OMV) that is highly stable and user-friendly, and that Docker is also used to install apps in Synology’s products.

For my new NAS, I bought a 4-bay server from U-NAS because I wanted a small footprint, low power consumption, and the ability to add and remove drives easily. The U-NAS chassis is similar to my old Acer NAS. I went with OpenMediaVault because it was designed to be user-friendly and because I don’t need ZFS (used for FreeNAS).

I wanted to document my setup in case I needed to re-do this in the future for other servers. I followed this guide and this guide to install and setup my server. Here are some things that I want to document.

First, after I installed the OS, I received the following error message when the server booted up: mdadm: no arrays found in config file or automatically. Apparently, this happens when because I installed the OS on an SSD and then I added additional HDD’s. This post suggess a fix: with only the SSD, boot up the server, plug in the HDD, log in as root, execute update-grub, and restart the server.

  1. After the OS is installed, visit the server’s IP address to reach the OMV’s web console. Change the default password.
  2. Since it is a NAS, I installed multiple hard drives for data storage on top of the OS drive. Assuming the drives have been formatted as something like an ext4 format, the drives should be mounted by default in /srv/.
  3. Create shared folders under Access Rights Management > Shared Folders. I created one called home, and one called share. The former is used to store the home directories of the users (I prefer to have this enabled for users with ssh access). The latter is used to store my media files to be shared across the network.
  4. Create users under Access Rights Management > User
    • At least one user belonging to the following groups (my admin account via ssh): users, ssh, sudo, and root.
    • At least one generic user used to access files from the TV or other media devices (eg, mediauser), belonging to the following group: users.
  5. Turn on SSH and SMB/CIFS services under Services.
  6. I was able to install omv-extras by running the following in the terminal: wget -O - | sudo bash
  7. In the web console, I can now install Docker and Portainer under System > OMV-Extras > Docker.
  8. One can log onto Portainer’s web console with server_ip:9000. Set up the admin password.
  9. Here is an example of how to install a Docker app using Portainer, with Heimdall as the example. Since we’ll be installing various apps and services as in self-contained Docker containers, we’ll have to manage many of them via their own web consoles (different ports); Heimdall bookmarks all the URL’s so we have a landing page that itemizes all the apps for easier navigation. I will show how to enter the critical information from the Docker Hub page. Compare the values from that page and what’s entered here.
    • In Portainer, go to the Container page. Click Add container.
    • Name: heimdall
    • Registry: leave as DockerHub
    • Image: paste in linuxserver/heimdall
    • Manual network port publishing: add 2 ports (ports description under the docker-compose schema)
      • 80 -> 8888 (80 goes to OMV by default)
      • 443 -> 8889
    • Under Advanced container settings > Volumes, map the following container -> volume paths (volumes description under the docker-compose schema)
      • /config -> /appdata/heimdall (NOTE: create /appdata on the OS hard drive, as the config folder for all Docker apps will be stored here using the layout /appdata/name_of_app).
    • Under Advanced container settings > Env, enter the following environment variables (environment description under the docker-compose schema)
      • PUID -> 1000 (for the primary user set up in step 4, type the following in the web console to get the user id: id -u username)
      • PGID -> 100 (for the primary user set up in step 4, type the following in the web console to get the user id: id -g username)
    • Under Advanced container settings > Restart policy, select Unless Stopped
    • Click Deploy the container to have the app started.
    • Visit server_ip:8888 to get to the Heimdall page
  10. Additional Docker apps could be installed in a similar fashion as the previous step. Note that some apps will also have a /data or /downloads folder, so map them accordingly. Some apps also require the following Docker option --cap-add=NET_ADMIN; this is set under Advanced container settings > Capabilities.

I think my new NAS should last me at least 10 years.

Windows 10: “We’ve got an update for you” – not responding

This is the second time this has happened to me, so I figure I’d document the solution because it takes me 30 minutes each time to resolve without hard resetting the computer.

Problem: on Windows 10, I would get a “We’ve got an update for you” message that asks you to update your OS or re-schedule, but the user cannot select any option as the window does not respond. Windows also disables the mouse interaction with all other windows.


  • Press: Windows Key + X
  • Press “A” or select “Windows Powershell (Admin)”
  • In the console, type:

taskkill /IM "MusNotificationUx.exe" /F

The non-responsive message should go away.

RSelenium: get all possible values of a drop down menu

To get all possible values of a drop down menu using RSelenium, leverage the selectTag function after finding the element as described here:

webElem <- remDr$findElement(using = 'xpath', "//select[@name='txtTargetToMm']")
opts <- webElem$selectTag()

> opts$text
[1] "JANUARY"   "FEBRUARY"  "MARCH"     "APRIL"     "MAY"       "JUNE"      "JULY"      "AUGUST"   
> opts$value
[1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"
> opts$selected

Force new WAN IP from ISP with new WAN MAC address on router

I recently wanted my ISP (Spectrum Internet, formerly Time Warner Cable) to issue me a new IP address. I called the provider and they informed me they couldn’t do it manually. The only way for me to get a new IP address is to wait until the routine planned scheduled date (in a month) for a new IP to be issued, or to get a new cable modem (ISP would assign a new IP to the new MAC address). I opted for the latter because they offered me a free modem as part of my plan, and because I wanted this done ASAP. However, the ISP still issued me the same WAN IP to my router when I plugged in the new modem. I next tried plugging my laptop directly into the modem using an ethernet cable. The ISP did issue me a new IP address. This shows that the ISP issues an IP address to the device plugged into the modem, not to the modem.

Thankfully, my router had Tomato firmware running, and I was able to increase my WAN MAC address by one bit by going to my router’s config page (http://router_ip/advanced-mac.asp), changing the MAC address, and saving the settings. My ISP saw the new MAC address and issued me a new IP address. In a month, I will revert back to my old MAC address.

NOTE: I originally tried to modify my WAN MAC address using the nvram command:

nvram get wan_hwaddr # WAN MAC address; change by increasing this by 1 bit for new MAC address
nvram show | grep addr # shows all addresses; make sure my new MAC address is not used by anything
nvram set wan_hwaddr=aa:bb:cc:dd:ee:ff # new mac address

nvram get wan_hwaddr

nvram commit

#reboot router

I tried executing the both code both in the web browser (under Tools) and using SSH. The settings were not saved after reboot.

Note on using T-SQL via R and ODBC

Reminder to self: when trying to query data into R using an existing SQL script that uses T-SQL to access data in Microsoft SQL Server, add SET NOCOUNT ON; to the beginning of the script, then add SET NOCOUNT OFF before the last select statement (the query returning the data of interest). If this is not done, an empty is returned instead.

Estimating standard errors using the Delta Method in R

I just discovered the msm package in R that offers the deltamethod function for calculating the standard error of a transformation of estimated regression coefficients. A tutorial could be found here. I had to use it recently to calculate the standard error for a difference in proportions when logistic regression model was used to estimate the log-odds of an event:

fit <- glm(Response ~ treat, data=my_data, family=binomial(link='logit')) # treat is binary
diff <- 1/(1 + exp(-coef(fit)[1] - coef(fit)[2])) - 1/(1 + exp(-coef(fit)[1])) # difference in proportion
se.diff <- deltamethod(~ 1/(1 + exp(-x1 - x2)) - 1/(1 + exp(-x1)), coef(fit), vcov(fit)) # returns standard error

Home Security: record CCTV cameras with ZoneMinder as DVR


I started working on this project and documenting it in late October 2014. I’ve only recently had the time to finish writing it all up.

I was interested in a home surveillance security system for a several reasons:

  1. a friend’s house and car were burglarized three times in the last few years, and
  2. another friend was sued by his neighbor, claiming his dogs were excessively loud during the day.

A surveillance camera system would have helped in identifying the culprits in the first case. In the second case, my friend was able to bring video footage from his home CCTV surveillance system that he bought from Costco to court to disprove his neighbor’s claim: that to the contrary, the dogs were not barking during the stated time.

In implementing my own CCTV surveillance security system on my premise, I had several options:

  1. purchase a system from a home monitoring security company and make monthly payments for their monitoring service,
  2. purchase a CCTV surveillance system that Costco, Sam’s Club, and many online retailers sell, or
  3. build my own DIY system.

I decided on the DIY route because

  1. it saves me money in the long run (ie, no recurring fees),
  2. I have the freedom to configure it the way I want it,
  3. I am able to reuse the hardware and service if I ever decide to move, and
  4. the stars were aligned regarding the critical components:
    • the cost of an energy efficient server with a small footprint is relatively inexpensive
    • a fully mature open source software exists on Linux for monitoring and recording
    • commodity wireless cameras are relatively inexpensive (namely, Foscam FI8910W)
    • mobile phone apps exist for easy remote viewing (Android: tinyCam Monitor and ZmView)

Note: be mindful of the laws regulating surveillance and privacy in your local jurisdiction (for me, it’s California).

The main ingredient to my home surveillance system is ZoneMinder, a Linux CCTV recording platform that doesn’t require much processing power to run. More information about it’s use could be found here and here.

Regarding cameras, I opted for wireless IP cameras over wired analogue cameras because wiring can make installations time-consuming and is something I’m just not very good at (this was the primary reason why I decided not to install a surveillance system from Costco when I first moved into my current house). In particular, I chose the Foscam FI8910W for inside the house and Foscam FI8904W for outside the house based on a friend’s recommendation.

With the server’s hardware, I wanted to maximize computing power while constraining power consumption (< 30W), cost ($200), and footprint (mini-ITX). It took a couple of days of research, but I found the Zotac ZBOX CI320 nano to be more than sufficient. Based on my testing, it consumes about 9W of power when recording one camera. This isn’t too surprising as system is based on a quad-core Intel Celeron processor. I normally refrain from Celerons but this is plenty of juice for the server’s only job: recording videos using ZoneMinder. I slapped on 8GB of ram and a 500GB hard drive 2.5″ hard drive. Ideally, I would have liked to install a 4TB drive if I had a 3.5″ slot or a maybe 2TB drive in the 2.5″ slot. However, the drive was given to me for free and it more than suffices for making a few week’s worth of video footages readily available. Update: this hard drive died after running for 1.5 years; I upgraded to a WD Red Mobile 2.5″ 1TB server-grade hard drive.

Installing ZoneMinder

When configuring the server, first go into the BIOS (hold DEL during boot) and change the UEFI settings to Windows 7 (Windows 8 by default). Otherwise, the device will not boot without a monitor plugged in, and this will render the device useless as a headless server.

I originally planned to run CentOS 6.5 on the Zotac for stability, but was unable to install it because of some kernel issue when trying to install from the LiveCD. I therefore went with Ubuntu Server 14.04 (LTS), which I’m even more familiar with. During installation, I installed LAMP and OpenSSH as well in order to run ZoneMinder and manage the server remotely. Installation was relatively straightforward if you install from the repo (as of 3/24/2016, I used version 1.29.0) :

sudo apt-get update && sudo apt-get upgrade
sudo a2enmod cgi
sudo service apache2 restart
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:iconnor/zoneminder
sudo apt-get update
sudo apt-get install zoneminder php5-gd libvlc-dev libvlccore-dev vlc
mysql -uroot -p < /usr/share/zoneminder/db/zm_create.sql
mysql -uroot -p -e "grant all on zm.* to 'zmuser'@localhost identified by 'zmpass';"
mysqladmin -uroot -p reload
sudo chmod 740 /etc/zm/zm.conf
sudo chown root:www-data /etc/zm/zm.conf
sudo adduser www-data video

Add the sleep line in /etc/init.d/zoneminder:

start() {
      sleep 15
      echo -n "Starting $prog: "

Finish configuring with the following:

sudo a2enmod cgi
sudo a2enconf zoneminder
sudo a2enmod rewrite
sudo service zoneminder start

Edit date.time_zone in /etc/php5/apache2/php.ini:

; Defines the default timezone used by the date functions
date.timezone = America/Los_Angeles

Restart the computer.

Configure SSL by doing:

sudo make-ssl-cert generate-default-snakeoil --force-overwrite
sudo a2enmod ssl
sudo a2ensite default-ssl
sudo service apache2 restart

The encrypted url https://server.ip/zm should now work; the unencrypted url is http://server.ip/zm.

Configuring ZoneMinder

  • Go to http://server.ip/zm in the web browser
  • Go to Options > Paths and set PATH_ZMS to /zm/cgi-bin/nph-zms (Caution: make sure you enter only /zm/cgi-bin/nph-zms with no space at the end or your video will not work!)
  • Go to Options > Users: change the default admin username and password and add a new user for viewing only (no admin rights)
  • Go to Options > System: check OPT_USE_AUTH for to require user login and OPT_CONTROL to enable control on cameras
  • Go to Filters, select PurgeWhenFull from the Use Filter dropdown. Set Disk Percent to 90 so events (recorded videos) are deleted when the disk reaches 90%. Click Save.
    • The length of saved videos will thus depend on the hard drive’s capacity and number of cameras.
  • On the main page, go to Running > Restart > Apply

Adding Cameras to ZoneMinder

I have a few Foscam FI8910W cameras, so I followed these instructions.

  • Check if /usr/share/perl5/ZoneMinder/Control/ exists on the server; if not, download from
  • Click Add New Monitor
    • In the General tab, give the camera a Name, set Source Type to Remote, and set Function to Mocord (Record and Motion Detection)
    • In the Source tab,
      • Enter in the camera’s IP address in Remote Host Name
      • In the Remote Path field, enter /videostream.cgi?user=visitor_user&pwd=visitor_pw, where visitor_user and visitor_pw is the camera’s login with at least visitor status
      • Enter 640 and 480 for width and height, and select Perserve Aspect Ratio
    • In the Control tab,
      • Set Controllable
      • Select Foscam FI8918W for Control Type
      • Set Control Address to the camera’s IP address
      • Set Control Device to operator_user:operator_pw where operator_user and operator_pw are the camera’s login info with at least operator status (this format is camera firmware specific)

Now, on the main ZoneMinder page, we could view the stream by clicking on each of the camera. From there, we could also view old recorded events.

On the ZoneMinder server, the recorded events (videos) are saved at /var/cache/zoneminder/events (many jpeg files since we are using the jpeg protocol of the Foscam camera).

Remote viewing

In order to view ZoneMinder outside the home network, set port forwarding on the local router. For example, one could use the ZmView app on an Android device to view the cameras remotely at https://server.ip:port/zm.

For remote viewing, it might be better to view directly from the cameras (so long as they are set up securely). Viewing the videos from ZoneMinder will increase the CPU load. Remember, the primary purpose of ZoneMinder is to record the video footage in case they are needed in a month.

Secure access to home IP cameras over the web with SSL

IP cameras are cheap these days. I have a few Foscam FI8910W cameras at home as part of my home security system. A friend recently gave me a Foscam C1 to be used as a baby monitor. I try to avoid using the manufacturer’s mobile device apps for viewing the video feeds because 1) I don’t want my private information (video, IP address, credentials, etc) exchanged or leaked to the manufacturer and 2) I don’t want to have to use multiple apps if I buy multiple cameras from different manufacturers. I currently use tinyCam Monitor PRO on my Android device to view the video feed both on the home network and while I’m away. It works very well, is customizable, and supports many different manufacturers.

I took precautions when setting up the cameras on my home network since I know these cameras could be easily hacked with a default setup:

  • Secure home network with a router running Tomato Firmware
  • Use the latest firmware on the cameras
  • Change the default username and password on the cameras
  • Avoid using any of the manufacturer’s services on the camera that makes it easy to view the video feed on the web (eg, DDNS)
  • Use SSL (https) to encrypt the internet traffic/video feed

Regarding the use of SSL, newer cameras like the Foscam C1 should support it by default, so making it world-accessible is as easy as setting up a port forward on the router to the IP camera port 443 (or whatever the https port is). However, for older cameras like the Foscam FI8901W that don’t support SSL, one could set up an encrypted reverse proxy using a web server like nginx. Luckily, nginx is available on my router running Tomato Firmware so I don’t have to use a separate Linux server. Here’s how I set this up on my router running Tomato:

  • On the Administration page of the router running Tomato, enable both http and https access to the router’s administration. Check the “Save to NVRAM” box. The point of this is to create a certificate and key on the router that we could use with nginx so we don’t have to generate these files elsewhere. I ssh’d into the router and found the following files: /tmp/etc/cert.pem and /tmp/etc/key.pem.
  • Go to the Web Server page of the Tomato router, and follow this guide step by step to incrementally set up a reverse proxy on nginx. Paste the final server code inside the “http” section of the web server page on Tomato (assuming we want port 1234 to be the external port for the camera, is the camera’s IP address internally, and 88 is the web port of the camera internally):

server {
   listen 1234; # doesn't have to be port 443 - could be any port (say 8080) if you 
               # connect via .  But on port 443
               # you can just use
   ssl on;
   ssl_certificate  /tmp/etc/cert.pem;
   ssl_certificate_key  /tmp/etc/key.pem;
   # certificate and private key where you just placed them

ssl_session_timeout 5m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; ssl_ciphers "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 EECDH+ECDSA+SHA256 EECDH+aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+aRSA+RC4 EECDH EDH+aRSA RC4 !aNULL !eNULL !LOW !3DES !MD5 !EXP !PSK !SRP !DSS"; # reasonable SSL configuration, disable some known weak ciphers.

location / { proxy_pass; proxy_redirect $scheme://$host:$server_port/; # If your webcam (or other proxied application) ever gives URLs in the HTTP headers # (e.g., in a 302 or 304 HTTP redirect), # the proxy_redirect line changes the URL in the HTTP header field # from to } }

  • Every camera should have a separate server chunk like the previous.
  • Set up port forwarding: external 1234 to, where is the IP address of the router.
  • To make the router respond to the ports (eg, 1234) when requested from the outside world, we need to add a rule to iptable by pasting the following in the “Firewall” section of the Administration > Scripts page (to have it run immediately without reboot, ssh into the router and execute):
iptables -t filter -A INPUT -p tcp --dport 1234 -j ACCEPT
  • Do the previous step for each camera or port. If this is not done, then the external port will not work when we access it outside the home network (the point of all this!).
  • Turn on the nginx web server on the Web Server page.
  • When away, go to a web browser or mobile app and point to https://external_ip:1234 (enable SSL if using a mobile app) and the video feed should be encrypted and available.

Using Vagrant to scale data science projects with Google Cloud Compute

When I was in graduate school, I made heavy use of the school’s computing infrastructure for my research by scheduling many simulation jobs, utilizing multiple (if not all) compute nodes in parallel using Grid Engine. In my professional life, I’ve always pushed to have a computing environment dedicated for research and analysis. This was typically in the form of a Linux or UNIX server with many CPU cores (64 one) and as much ram as I could get. The difficulty in getting the approval to have such an environment depended on the company’s culture, so YMMV. The beauty of the work setup over the graduate school setup is that a job scheduler was never needed as the number of concurrent users vying for compute cycles are drastically less at work. When building a computing environment, I always try to build the beefiest server possible (translation: that I could get approval for) because I never want to run into a project that the server couldn’t handle (eg, loading a very large data set into memory with R). However, it’s hard to future-proof all projects completely so the line had to be drawn somewhere and thus the number of CPU cores and memory had a limit.

Now, with the ubiquity of computing environments offered by different cloud providers (eg, Amazon EC2, Google Compute Engine, Microsoft Azure, HP Public Cloud, Sense, and Domino Data Labs), spinning up an on-demand compute server or a cluster of nodes to scale data analysis projects is pretty simple, straightforward, and relatively cheap (no need to invest thousands of dollars to build a beefy server that reaches full capacity in <1% of the time). One could leverage these cloud services both in the work environment (if one could get it approved) and for personal use (eg, a Kaggle competition).

Sense and Domino Data Labs have servers pre-configured with many standard open source tools for data analysis. They are good options for quickly jumping into analysis. With the more generic cloud providers, one typically spins up a server with a standard Linux OS and then proceed to install and configure the necessary tools. To streamline the scaling process, Vagrant allows one to spin up, provision, manage, and destroy servers from the various providers in a simple and consistent manner. I’ll illustrate how one might use Vagrant to spin up a compute server on the Google Compute Engine (GCE) to analyze data. I’m only choosing GCE at the time of this writing because it appears to be the cheapest and because it charges by the minute (10 minutes miminum), unlike Amazon EC2 which charges by the hour.

gcloud init # authenticate; choose a default zone, I chose "us-central1-b", which will show up in subsequent sections
  • Install the Vagrant-Google plugin. On Windows, Vagrant-Google version 1.8.1+ is required, which hasn’t been released at the time of this writing. A work-around is to spin up an Ubuntu server using Vagrant and Virtualbox on Windows, install Vagrant in that Ubuntu server, and go from there. Run the following to install the Vagrant-Google plugin:
vagrant plugin install vagrant-google
  • Follow the instructions here to create a Service Account within the GCE console (website), download the json private key (used by the Google Cloud SDK, call it google_json_key.json), copy the api service account email address and project ID, and add your ssh public key (eg, ~/.ssh/ to the Compute Engine’s metadata.
  • Add the GCE box (template) to Vagrant:
vagrant box add gce
  • Create a project folder (eg, vagrant_google) and cd into it. Everything in this folder will get synced to /vagrant/ for any provisioned server.
  • Create the file Vagrantfile:
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config| = "gce"

  config.vm.provider :google do |google, override|
    google.google_project_id = "MY_PROJECT_ID"
    google.google_client_email = ""
    google.google_json_key_location = "/absolute/path/to/google_json_key.json" = "us-central1-b"
    google.machine_type= "n1-standard-1"
    # google.machine_type = "n1-highmem-2"
    # google.machine_type = "n1-standard-16"
    # google.machine_type = "n1-highmem-8" = "gce-instance"
    google.image = "ubuntu-1404-trusty-v20160222" # may change, use "gcloud compute images list" to check
    # google.image = "image-ds"
    google.disk_name = "disk-ds" # ds for data science
    google.disk_size = "10"
    override.ssh.username = "MY_USERNAME"
    override.ssh.private_key_path = "/path/to/.ssh/id_rsa"
  config.vm.provision :shell, path: ""
  • Create the file (this gets run whenever a server is provisioned via vagrant up):
#! /usr/bin/env bash

touch /tmp/foo # whoami?

# configure mail so I could communicate status updates
cat > /tmp/nail.rc <<EOF
set smtp-use-starttls
set smtp-auth=login
set smtp=smtp://
set smtp-auth-password="MY_GMAIL_PW"
sudo su
cat /tmp/nail.rc >> /etc/nail.rc
rm /tmp/nail.rc

# configure screen
cat > ~/.screenrc <<EOF
escape ^lL
bind c screen 1
bind 0 select 10
screen 1
select 1
autodetach on
startup_message off
  • Run vagrant up and Vagrant will provision the server.
  • Run vagrant ssh to log into the server via ssh.
  • Install the necessary tools:
# Software installation
sudo su
echo "deb trusty/" > /etc/apt/sources.list.d/r.list
apt-key adv --keyserver --recv-keys E084DAB9
apt-get update 
apt-get -y install r-base-dev libcurl4-openssl-dev libssl-dev git
apt-get -y install heirloom-mailx # setup mail
exit # sudo

# R
sudo dpkg -i MRO-3.2.3-Ubuntu-14.4.x86_64.deb
tar -xzf RevoMath-3.2.3.tar.gz
cd RevoMath
sudo ./ # choose option 1, then agree
cd ..
rm -rf RevoMath
sudo R --vanilla <<EOF
install.packages(c("data.table","readr","randomForest","gbm","glmnet","ROCR","devtools"), repos="")
# options(unzip = 'internal') # vinh
# devtools::install_github("dmlc/xgboost", subdir = "R-package")
install.packages("drat", repos="") #
install.packages("xgboost", repos="", type="source")

# Python
# scroll and yes

# VW
sudo apt-get -y install libtool libboost1.55-*

# Java
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get -y install oracle-java7-installer

# H2O
sudo R --vanilla <<EOF
# The following two commands remove any previously installed H2O packages for R.
if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }

# Next, we download packages that H2O depends on.
pkgs <- c("methods","statmod","stats","graphics","RCurl","jsonlite","tools","utils")
for (pkg in pkgs) {
    if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg, repos='') }

# Now we download, install and initialize the H2O package for R.
install.packages("h2o", type="source", repos=(c("")))
  • Exit out of the ssh session. The server is still running. Let’s set the root disk to not delete when the server is destroyed. Then save the root disk as an image. With this saved private image, one can provision multiple servers using the same image without having to re-install all the previously installed software on gce-instance. Once we have a saved image, restart the instance with the previous root disk, set the disk to automatically delete, and then delete the instance again (this time, the disk will also be destroyed). What we have left is an image that could be re-used. Image storage is not that costly if one wants to keep it around for quickly provisioning their computing environment.
gcloud compute instances set-disk-auto-delete gce-instance --no-auto-delete --disk disk-ds # don't delete root disk
vagrant destroy --force
gcloud compute images create image-ds --source-disk disk-ds --source-disk-zone us-central1-b
gcloud compute images list
vagrant up
gcloud compute instances set-disk-auto-delete gce-instance --auto-delete --disk disk-ds
vagrant destroy # now disk is also destroyed, only image is left.
  • With the saved private image, we could modify Vagrantfile to use this image whenever we spin up a server:
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config| = "gce"

  config.vm.provider :google do |google, override|
    google.google_project_id = "MY_PROJECT_ID"
    google.google_client_email = ""
    google.google_json_key_location = "/absolute/path/to/google_json_key.json" = "us-central1-b"
    google.machine_type= "n1-standard-1"
    # google.machine_type = "n1-highmem-2"
    # google.machine_type = "n1-standard-16"
    # google.machine_type = "n1-highmem-8" = "gce-instance"
    # google.image = "ubuntu-1404-trusty-v20160222"
    google.image = "image-ds"
    google.disk_name = "disk-ds" # ds for data science
    google.disk_size = "10"
    override.ssh.username = "MY_USERNAME"
    override.ssh.private_key_path = "/path/to/.ssh/id_rsa"
  config.vm.provision :shell, path: ""

When one needs more CPU’s, memory, or disk size, just modify google.machine_type and google.disk_size per the pricing sheet.

With this setup, I can put relevant files in vagrant_google/my_project. My current work flow is something like the following:

vagrant up # provision server
vagrant ssh # log in
screen # start a persistent terminal session
# paste the following
mkdir -p $logdir

# paste in code that I want to run, eg
R --no-save < $projdir/myfile.R > $logdir/myfile.Rlog
python $projdir/ > $logdir/myfile.pylog
echo '' | mailx -s 'Job Done' $mailaddr
# detach screen session

When the job is done, I will receive a notification email. When the job is currently running, I can use vagrant ssh to log into the server and re-attach my running screen session to inspect what is going on or make any changes. When I’m done, I could just destroy the server (the content of /vagrant/ will get re-synced to vagrant_google). I could also create batch jobs to start the server, run code, notify me, and destroy the server in one shot if I didn’t want to run my job interactively.

Conclusion: With Vagrant, open source tools, and the cloud providers, I can spin up as much compute resources as I need in order to scale out my data analysis project. If I had the need to build stand-alone applications, I could also incorporate Docker.

Google Voice using SIP via Simonics

Google Voice ended third-party support for XMPP clients in May 2014. My Asterisk auto-attendant remained functional into 2015. However it’s finally broken for me: when people call in, or when people don’t call in, some digits get pressed randomly in Asterisk (probably because of some negotiation between Asterisk and GV via XMPP) and calls are forwarded sporadically throughout the day (what I call phantom calls). Is there a way to still leverage Google Voice as my auto-attendant? I found this post talking about Simon Telephonics offering a GV gateway using SIP. There is a $5 setup fee. I paid it and now am able to continue using it with Asterisk as my auto-attendant. It also works with any SIP phone or SIP app (eg, CSipSimple). I like this solution because I still get free calls using Google Voice without having to use another third-party to pay for my calls.

To set up on Asterisk, empty out /etc/asterisk/motif.conf and /etc/asterisk/xmpp.conf. Then edit /etc/asterisk/sip.conf:

; simonics

[GV1234567890] type=friend username=GV1234567890 secret=simonics_given_pw qualify=yes context=tnttsp ; context in extensions.conf disallow=all allow=ulaw

1234567890 is the assumed Google Voice telephone number.

In my previous dial plan (/etc/asterisk/extensions.conf), I had to comment out the third line:

exten => s,1,Answer()
exten => s,n,Wait(1)
;; exten => s,n,SendDTMF(1) ;; needed for jabber/xmpp/mptif, not sip via simonics

SendDTMF(1) was previously required. However, Simon Telephonics sets everything up with Google Voice so this isn’t required.

In the same file, to call 1987654321, change




All is good!