Remotely access files on your home Windows computer

Most of my home computers/servers run Linux, so accessing them remotely is quite easy via the ssh protocol. Even the Windows machines I own have an ssh server installed via Cygwin.

Now, for those not familiar with Linux, one could

  • Install freeSSHd and set up an ssh server on Windows.
  • Configure an account and have freeSSHd initiate at startup.
  • Forward port 22 on the home router to port 22 on the machine with freeSSHd (assign it a static ip from the router); or, use a different external port for safety reasons (eg, 1022 -> 22).
  • Use the portable executable of WinSCP to access your files on any other Windows computer with internet access.

Delimited file where delimiter clashes with data values

A comma-separated values (CSV) file is a typical way to store tabular/rectangular data. If a data cell contain a comma, then the cell with the commas is typically wrapped with quotes. However, what if a data cell contains a comma and a quotation mark? To avoid such scenarios, it is typically wise to use a delimiter that has a low chance of showing up in your data, such as the pipe (“|”) or caret (“^”) character. However, there are cases when the data is a long string with all sorts of data characters, including the pipe and caret characters. What then should the delimiter be in order to avoid a delimiter collision? As the Wikipedia article suggests, using special ASCII characters such as the unit/field separator (hex: 1F) could help as they probably won’t be in your data (no keyboard key that corresponds to it!).

Currently, my rule of thumb is to use pipe as the default delimiter. If the data contains complicated strings, then I’ll default to the field separator character. In Python, one could refer to the field separator as ‘\ x1f’. In R, one could refer to it as ‘\ x1F’. In SAS, it could be specified as ‘1F’x. In bash, the character could be specified on the command line (e.g., using the cut command, csvlook command, etc) by specifying $’1f’ as the delimiter character.

If the file contains the newline character in a data cell (\n), then the record separator character (hex: 1E) could be used for determining new lines.

Make Windows like Linux

I’m back to a job that only allows Windows on our laptops and desktops. Here’s how I configured my workstation to be more Linux-like in order to increase my productivity:

  • Install Google Chrome and Firefox
  • Install an antivirus or security suite (Norton or Mcafee?); a free one is Avast
  • Map my caps lock key to control; if Admin access is not available, then use AutoHotKey by creating caps_to_control.ahk with Capslock::Control and creating a Startup shortcut
  • Download Cygwin and install the following: xinit (X server), python (2 and 3), gcc-*, openssh, screen, rsync, python-setuptools (easy_install)), git, subversion, xwinclipboard, procps (top command + others), aspell (I use flyspell in emacs), aspell-en, make, zip, unzip, patch, wget, perl, perl-dbi, libcrypt-devel (for perl DBD:ODBC), automake, autoconf, email (modify /etc/email/email.conf and enter correct server and credential information; sendmail is not needed for sending outbound emails, only needed to send internal emails)
  • Install the emacs binaries to C:\Documents and Settings\my_username\bin\emacs-ver_num (Windows XP) or C:\Users\my_username\bin\emacs-ver-num (Windows 7) and copy relevant image library dll files into the emacs binary directory in order for doc-view to work properly (eg, need libpng14-14.dll for emacs 24.3
    • See Vincent Goulet’s emacs distro to see what dll files are needed)
    • One could also use emacs w32 provided by cygwin, but ESS doesn’t seem to work because that version of emacs does not have the function w32-short-file-name compiled with it as needed by ESS; tentative solution can be found here, but it’s probably better to use the compiled emacs binaries available on GNU
  • Set the HOME environment variable to C:\Documents and Settings\my_username or C:\Users\my_username; set LC_ALL and LANG to en_US.utf-8 (for perl dbi error); set CYGWIN=nodosfilewarning
  • Edit environment variables by running the following in the command prompt: rundll32 sysdm.cpl,EditEnvironmentVariables. Add the following to the PATH environment variable: path_to_R;path_to_JRE;C:\Documents and Settings\my_username\My Documents\bin;C:\Documents and Settings\my_username\My Documents\bin\emacs-24.3\bin;C:\cygwin\bin. If we cannot edit System variables, then edit the user’s variables (eg, PATH to be path1;path2;%PATH%). If the settings aren’t saved for future sessions (eg, in Citrix), then create a symbolic link from /home/user_id to the desired home (eg, C:/Users/user_id), and add the following to ~/.bashrc: export PATH=/cygdrive/f/R/R-3.1.1/bin/x64:/cygdrive/f/bin/emacs-24.3/bin:/cygdrive/f/bin:/cygdrive/f/bin/cygwin64/bin:$PATH and export JAVA_HOME=F:/bin/jre7.
  • Add the following to ~/.bash_profile: . ~/.bashrc
  • Fix the carriage return issue (only for Windows XP)
  • Run touch ~/.startxwinrc to prevent xterm from launching whenever X server is started.
  • Install R using the Windows installer (works with Emacs ESS); install R studio
  • Get sshfs to work on Windows to mount my servers
  • Install Dropbox and symlink my .bashrc and .screenrc files
  • Get tramp in emacs to work properly in order to visit remote servers easily in emacs by first getting the latest copy of tramp, then configure and byte-compile the code (make) per the proper installation. Add plink, pageant, and all putty-related binaries into the PATH (~/bin). After creating an ssh key, use the putty kegen to convert id_rsa to id_rsa.ppk. Create a shortcut at startup that launches pageant /path/to/id_rsa.ppk. Then in emacs, one could access remote files using tramp via plink: /plink:username@host:/path/to/dir.
  • Copy X Windows shortcut to the Startup folder to automatically start it
  • Add export DISPLAY:0.0= to ~/.bashrc
  • Python packages: numpy, pandas, csvkit (easy_install works but pip install does not), jedi, epc, pyodbc, mysqldb, pymssql, psycopg; ddlgenerator (Python 3).
  • R packages: RODBC, RMySQL, RPostgreSQL, ggplot2, plyr, glmnet
  • Perl packages (for edbi in emacs): cpan RPC::EPC::Service, cpan YAML, cpan -i DBD::ODBC (gcc4 error; edit Makefile and change “CC=gcc4″ to “gcc”; “LD=g++”)
  • Set up ssh server via Cygwin and open up port 22 in Windows Firewall; freeSSHd is also an alternative
  • On Windows 8, the user might not be able to change group permissions (eg, can’t ssh using keys because the key is “too open”); fix by changing files/directories group to ‘User’
  • Use Autopatcher to install download all necessary updates and install them all at once
  • Have the following shortcut in the startup folder in order to have a a terminal open up at startup with screen initiated: C:\cygwin\bin\mintty.exe -e screen -s bash; in the shortcut, specify the home directory as the ‘Start In’ path
  • Install UniKey for typing in Vietnamese (place in ~/Documents/bin), 7-Zip for handling archive files, and Virtual CloneDrive for handling disk image files
  • Install CutePDF Writer (also download and install Ghostscript from CutePDF) for printing to PDF files
  • Install Java Runtime Environment (JRE); if admin privileges aren’t available, then extract the files manually into ~/Documents/bin/jre/
  • Other tools per Lifehacker: VLC, PDF-XChange, Foxit PDF Viewer
  • Configure sshd using openssh, make sure it starts at startup (Start > Run > Services; look for CYGWIN sshd), and allow /usr/sbin/sshd to pass through the Firewall; one could alternatively use freeSSHd
  • Bash shell in emacs via shell: add (setq shell-file-name "bash")

and (setq explicit-shell-file-name shell-file-name) to the emacs init file, and add the following to ~/.bashrc:

if [ "$EMACS" == "t" -a "$OSTYPE" == "cygwin" ]; then
    PS1='\u@\h \w\n$ '

If Dropbox cannot be installed then symlink my ~/.emacs.d directory (need to use mklink in order for symlink to work properly).

If the computer is a dual-boot with Linux installed first, then one can change the order of the bootloader to Windows by following these instructions.

This is a good post to review.

Find text or string in files of a certain type

One can use grep "mystring" myfile.ext to find the lines in myfile.ext containing mystring. One could also use grep "mystring" *.ext to find mystring in all files with extension ext. Similarly, one could use grep "mystring" /directory to search for mystring in all files in the directory. What if one wants to search for mystring in all *.ext files in a certain path /directory? Most posts online would suggest something along the line of

<pre class="src src-sh">find /directory -type -f -name <span style="color: #ffa07a;">"*.ext"</span> | xargs grep <span style="color: #ffa07a;">"mystring"</span>

However, the comments of this post shows how one could do it with grep:

<pre class="src src-sh">grep -r --include=*.ext <span style="color: #ffa07a;">"mystring"</span> /directory

Execute shell commands with an asterisk in SAS

I wanted to use %sysexec to execute a shell command with an asterisk (shell wildcard for globbing) in a SAS program:

<pre class="src src-sas"><span style="color: #7fffd4;">%sysexec</span> cp /tmp/foo<span style="color: #ff4500;">/*</span><span style="color: #ff4500;">.txt /tmp/bar ;</span>

However, it wasn’t giving me the desired results, probably due to the /* characters as they begin a commented section in a SAS program. Also tried escaping the asterisk with \* and surrounding the shell command with quotes but I didn’t get any luck. Emailed the SAS-L community for help and discovered the x and call system statements in SAS. The following works:

<pre class="src src-sas">x <span style="color: #ffa07a;">"cp /tmp/foo/*.txt /tmp/bar"</span> ;

/ or / data null ; call system(“cp /tmp/foo/*.txt /tmp/bar”) ; run ;

More information on executing shell commands in a SAS program can be found here.