Automatically specify line break options with termstr as CRLF or LF in SAS when importing data

It could be annoying when dealing with data from multiple platforms: Windows uses the carriage return (CR) and line feed (LF) to indicate a new line, UNIX uses LF, and Mac uses CR. Most companies have SAS running on a UNIX/Linux server, and it’s hard to tell which characters indicate a new line without going to a terminal to inspect the file.

Here’s a sas macro that creates a filename handle that could be used in PROC IMPORT or a DATA step. It will automatically detect CRLF and if not, default to LF. This assumes SAS is running on a UNIX server with access to the head and awk commands.

%macro handle_crlf(file, handle_name, other_filename_options=) ;
/* if there is a carriage return at the end, then return 1 (stored in macro variable SYSRC) */
%sysexec head -n 1 "&file" | awk '/\r$/ { exit(1) }' ;
%if &SYSRC=1 %then %let termstr=crlf ;
%else %let termstr=lf ;
filename &handle_name "&file" termstr=&termstr &other_filename_options ;
%mend ;
%handle_crlf(file=/path/to/file.txt, handle_name=fh) ;
proc import data=fh dbms=dlm replace outdata=d1 ;
   delimiter='|' ;
run ;

Make Windows like Linux

I’m back to a job that only allows Windows on our laptops and desktops. Here’s how I configured my workstation to be more Linux-like in order to increase my productivity:

  • Install Google Chrome and Firefox
  • Install an antivirus or security suite (Norton or Mcafee?); a free one is Avast
  • Map my caps lock key to control; if Admin access is not available, then use AutoHotKey by creating caps_to_control.ahk with Capslock::Control and creating a Startup shortcut
  • Download Cygwin and install the following: xinit (X server), python (2 and 3), gcc-*, openssh, screen, rsync, python-setuptools (easy_install)), git, subversion, xwinclipboard, procps (top command + others), aspell (I use flyspell in emacs), aspell-en, make, zip, unzip, patch, wget, perl, perl-dbi, libcrypt-devel (for perl DBD:ODBC), automake, autoconf, email (modify /etc/email/email.conf and enter correct server and credential information; sendmail is not needed for sending outbound emails, only needed to send internal emails)
  • Install the emacs binaries to C:\Documents and Settings\my_username\bin\emacs-ver_num (Windows XP) or C:\Users\my_username\bin\emacs-ver-num (Windows 7) and copy relevant image library dll files into the emacs binary directory in order for doc-view to work properly (eg, need libpng14-14.dll for emacs 24.3
    • See Vincent Goulet’s emacs distro to see what dll files are needed)
    • One could also use emacs w32 provided by cygwin, but ESS doesn’t seem to work because that version of emacs does not have the function w32-short-file-name compiled with it as needed by ESS; tentative solution can be found here, but it’s probably better to use the compiled emacs binaries available on GNU
  • Set the HOME environment variable to C:\Documents and Settings\my_username or C:\Users\my_username; set LC_ALL and LANG to en_US.utf-8 (for perl dbi error); set CYGWIN=nodosfilewarning
  • Edit environment variables by running the following in the command prompt: rundll32 sysdm.cpl,EditEnvironmentVariables. Add the following to the PATH environment variable: path_to_R;path_to_JRE;C:\Documents and Settings\my_username\My Documents\bin;C:\Documents and Settings\my_username\My Documents\bin\emacs-24.3\bin;C:\cygwin\bin. If we cannot edit System variables, then edit the user’s variables (eg, PATH to be path1;path2;%PATH%). If the settings aren’t saved for future sessions (eg, in Citrix), then create a symbolic link from /home/user_id to the desired home (eg, C:/Users/user_id), and add the following to ~/.bashrc: export PATH=/cygdrive/f/R/R-3.1.1/bin/x64:/cygdrive/f/bin/emacs-24.3/bin:/cygdrive/f/bin:/cygdrive/f/bin/cygwin64/bin:$PATH and export JAVA_HOME=F:/bin/jre7.
  • Add the following to ~/.bash_profile: . ~/.bashrc
  • Fix the carriage return issue (only for Windows XP)
  • Run touch ~/.startxwinrc to prevent xterm from launching whenever X server is started.
  • Install R using the Windows installer (works with Emacs ESS); install R studio
  • Get sshfs to work on Windows to mount my servers
  • Install Dropbox and symlink my .bashrc and .screenrc files
  • Get tramp in emacs to work properly in order to visit remote servers easily in emacs by first getting the latest copy of tramp, then configure and byte-compile the code (make) per the proper installation. Add plink, pageant, and all putty-related binaries into the PATH (~/bin). After creating an ssh key, use the putty kegen to convert id_rsa to id_rsa.ppk. Create a shortcut at startup that launches pageant /path/to/id_rsa.ppk. Then in emacs, one could access remote files using tramp via plink: /plink:username@host:/path/to/dir.
  • Copy X Windows shortcut to the Startup folder to automatically start it
  • Add export DISPLAY:0.0= to ~/.bashrc
  • Python packages: numpy, pandas, csvkit (easy_install works but pip install does not), jedi, epc, pyodbc, mysqldb, pymssql, psycopg; ddlgenerator (Python 3).
  • R packages: RODBC, RMySQL, RPostgreSQL, ggplot2, plyr, glmnet
  • Perl packages (for edbi in emacs): cpan RPC::EPC::Service, cpan YAML, cpan -i DBD::ODBC (gcc4 error; edit Makefile and change “CC=gcc4″ to “gcc”; “LD=g++”)
  • Set up ssh server via Cygwin and open up port 22 in Windows Firewall; freeSSHd is also an alternative
  • On Windows 8, the user might not be able to change group permissions (eg, can’t ssh using keys because the key is “too open”); fix by changing files/directories group to ‘User’
  • Use Autopatcher to install download all necessary updates and install them all at once
  • Have the following shortcut in the startup folder in order to have a a terminal open up at startup with screen initiated: C:\cygwin\bin\mintty.exe -e screen -s bash; in the shortcut, specify the home directory as the ‘Start In’ path
  • Install UniKey for typing in Vietnamese (place in ~/Documents/bin), 7-Zip for handling archive files, and Virtual CloneDrive for handling disk image files
  • Install CutePDF Writer (also download and install Ghostscript from CutePDF) for printing to PDF files
  • Install Java Runtime Environment (JRE); if admin privileges aren’t available, then extract the files manually into ~/Documents/bin/jre/
  • Other tools per Lifehacker: VLC, PDF-XChange, Foxit PDF Viewer
  • Configure sshd using openssh, make sure it starts at startup (Start > Run > Services; look for CYGWIN sshd), and allow /usr/sbin/sshd to pass through the Firewall; one could alternatively use freeSSHd
  • Bash shell in emacs via shell: add (setq shell-file-name "bash")

and (setq explicit-shell-file-name shell-file-name) to the emacs init file, and add the following to ~/.bashrc:

if [ "$EMACS" == "t" -a "$OSTYPE" == "cygwin" ]; then
    PS1='\u@\h \w\n$ '

If Dropbox cannot be installed then symlink my ~/.emacs.d directory (need to use mklink in order for symlink to work properly).

If the computer is a dual-boot with Linux installed first, then one can change the order of the bootloader to Windows by following these instructions.

This is a good post to review.

Find text or string in files of a certain type

One can use grep "mystring" myfile.ext to find the lines in myfile.ext containing mystring. One could also use grep "mystring" *.ext to find mystring in all files with extension ext. Similarly, one could use grep "mystring" /directory to search for mystring in all files in the directory. What if one wants to search for mystring in all *.ext files in a certain path /directory? Most posts online would suggest something along the line of

<pre class="src src-sh">find /directory -type -f -name <span style="color: #ffa07a;">"*.ext"</span> | xargs grep <span style="color: #ffa07a;">"mystring"</span>

However, the comments of this post shows how one could do it with grep:

<pre class="src src-sh">grep -r --include=*.ext <span style="color: #ffa07a;">"mystring"</span> /directory

sftp with restricted folder

I recently needed to set up an ftp server (or sftp server) that allows the user to transfer files. I had some restrictions:

  1. The account cannot have ssh access since I don’t want an unauthorized person to run jobs on the server.
  2. The account needs to be restricted to a single directory. I don’t want the account to have access to all files on the server.

I first followed this guide to get proftpd up with an account. However, I kept getting errors trying to log in using Nautilus or Filezilla. The error came from PASV mode, which I think stems from a firewall/NAT issue. I next tried this to use vsftpd. Still no go (same error).

I decided to use sftp since I know for sure ssh works and that it’s more secure. Now that I think about it, none of my server has an ftp server running since sftp is more secure and Nautilus and Filezilla supports the sftp protocol.

From this post, I re-discovered rssh and the native support from recent versions of openssh. The “match user” method for openssh and the rssh method did not work for me. I finally stumbled on this post that made things work.

sudo apt-get install openssh ## this is already installed for me
## modify /etc/ssh/sshd_config
# Use the following line to *replace* any existing 'Subsystem' line
Subsystem sftp internal-sftp

# These lines must appear at the *end* of sshd_config
Match Group sftponly
ChrootDirectory %h
ForceCommand internal-sftp
AllowTcpForwarding no

## in shell
sudo groupadd sftponly
sudo useradd newuser
sudo passwd newuser ## set password
sudo usermod -g sftponly -s /bin/false -d /home/newuser newuser
sudo chown root:root /home/newuser
cd /home/newuser
sudo mkdir upload ## upload files in here
sudo chown newuser:newuser upload
sudo /etc/init.d/ssh restart

Now, ssh with the newuser should not work, and sftp (via command line, nautilus, or filezilla) should only access one location.

Note that /home/newuser is own by root, so newuser can’t do much in there. Create a directory upload, and make newuser the owner.

LPR printing on command line

These days, with Linux distros such as Ubuntu, printing is quite easy. You can add and remove printers and print files just as easily (or even more so) compared to Windows or Mac. Sometimes I might just want to print from the command line. To do so, I make use of the lpr command.

To find out what printers are available, type:

 <pre class="src src-sh">lpstat -p -d <span style="color: #ff4500;">## </span><span style="color: #ff4500;">from CUPS manual</span>

This and this gives a good introduction to lpr.

 <pre class="src src-sh">lpr filename <span style="color: #ff4500;">## </span><span style="color: #ff4500;">print file using default printer</span>

lpr -P printername filename ## print file using the selected printer; see lpstat -p -d for list of printers lpr -# num filename ## to print num times

openoffice -p filename.doc ## for doc files

lpr should support txt, ps, and pdf files.

Delete .DS_Store files

I have a bunch of .DS_Store files in many directories in my home folder because I transferred them from my old system, a Mac OS X. To remove them, do

 <pre class="src src-sh">find . -name .DS_Store -exec rm <span style="color: #ffa07a;">"{}"</span> <span style="color: #ffa07a;">';'</span>

in ~/. Got this in the comments of this post.

Control my computer’s desktop (graphically) via VNC

Since all my computers are Linux-based, I have OpenSSH installed on them so I can connect to them remotely. If I am not on the home network, I either have ports forwarded from the router or VPN to my home network in order to connect to my destination. In addition, I almost always use screen for all my terminal sessions. Thus, once I ssh to the computer remotely, I can resume my screen session.

What if I wanted to control the current desktop of my computer, i.e., control applications graphically? Answer: VNC. On Ubuntu 10.10, vino is installed by default and it could be configured from within GNOME by going to System > Preferences > Remote Desktop. For other VNC Servers, see this.

To keep things secured, I don’t port forward port 5900 from my home router to the main computer. If I wanted to VNC into the machine, I will VPN to the home network first. Or better yet, I can port-forward via SSH.

Encrypted Connection via SSH port-forwarding

As the data from VNC is not encrypted, it is not safe to use across the internet. To use an encrypted connection, one can use the port-forwarding feature of OpenSSH to create one.

On the local machine (VNC from), type the following in the shell:

ssh -L 5900:localhost:5900 ## VNC to; if on local network via VPN, use local ip or hostname

Now, from the local machine, I can connect to localhost from any VNC client. Vinagre is the default on Ubuntu, accessible via Internet > Remote Desktop Viewer.

This is quite cool.

Update system BIOS on a Linux machine with a Windows/DOS updater

I recently had to update my BIOS on my Toshiba Portege R705 laptop. Unfortunately, the updater can only be run on a Windows/DOS system. Luckily, I remembered reading this article on Linux Journal that described how one can boot into FreeDOS to update the system BIOS. I will outline what I did.

  1. Find a USB drive with a capacity < 2 GB.
  2. Using GParted, format the drive to be FAT16.
  3. Download fdbasecd.iso and install qemu (sudo apt-get install qemu).
  4. Boot up FreeDOS with qemu: qemu -hda /dev/sdb -cdrom /path/to/fdbasecd.iso -boot d (assuming usb drive is /dev/sdb). Continue booting from “cd-rom” (iso file) and install FreeDOS to “harddisk” (usb stick) with default values.
  5. Do not reboot system after FreeDOS is done installing. Type following in DOS prompt:
  <pre class="src src-sh">fdisk /mbr 1

cd edit fdconfig.sys ## for line that starts with “SHELLHIGH…”, change to “1234?SHELLHIGH…” ## save: Alt-f ## exit to Dos, exit qemu

  1. Test if USB drive will boot using qemu: qemu -hda /dev/sdb -boot c
  2. Copy .exe file into usb drive. When I tried executing it in FreeDOS, it says it can’t because it is a Win32 file. I extracted the EXE file and read the README. It contained some files I can run in DOS.
  3. Restart system and boot with USB drive. On my Toshiba laptop, I had to press ESC then F12 to select USB drive.
  4. Once in DOS, run the command to update BIOS. NOTE: I take no responsibilities for your actions. Update your BIOS at your own risk. You can brick the machine if you aren’t doing things correctly or use the wrong BIOS file.

Find files and find files containing certain text

This is a reminder to myself as I keep forgetting how to do these basic searches in Linux.

To find files with file name containing the text foo using the command find:

## find file with "foo" in file name
find ./ -name "*foo*" ## replace ./ with path; can use shell style wildcards
## ignore upper and lower cases
find ./ -iname "*foo*"
## print pathnames of all files
find ./ -print

To find files with bar in it’s content using the command grep:

grep "bar" -r ./
## print lines without the word "bar"
grep -v "bar" -r ./
## note, can also use regexp with -E