This is a cool use of a Raspberry Pi!
The easiest way to import a delimited file (e.g., CSV) in SAS is to use
proc import datafile="/path/to/my_file.txt" out=work.my_data dbms=dlm replace ; delimiter="|" ; guessingrows=32000 ; run ;
PROC IMPORT isn’t a viable option when the fileref used in the
datafile argument is not of the DISK type. For example, the fileref
my_pipe would not work in the following example,
filename my_pipe pipe "gunzip -c my_file.txt.gz" ;
because SAS needs “random access” to the fileref (i.e., to determine the variable type).
PROC IMPORT also isn’t suitable when you have a very large data set where one of the columns might contain an element that has a very long length (and this might occur after the number of rows specified by
guessingrows). Based on my experience, one should use the
missover (don’t go to next line if line ends early),
dsd (allow empty field) and
lrecl (make this big for long lines; defaults to 256, which means your lines will be truncated if they are longer than 256 characters long) options in the
infile statement to avoid unnecessary errors.
infile is delimited, it is easy to import the fields using the list input method. However, one should use the
length statement to declare the maximum length for each character variable, and use the
informat statement for numeric variables that have special formats (date, dollar amount, etc.). I usually forget and just declare the informats following the variables in the
input statement, which only works when we are inputting using the input pointer method (e.g.,
@27 my_var date9.). Here is an example:
filename my_pipe pipe "gunzip -c my_file.txt.gz" ; data my_data ; infile my_file dlm="|" dsd truncover missover lrecl=50000 ; length x2 $50 x3 $25 ; informat x4 date9. ; format x4 date9. ; input x1 x2 $ x3 $ x4 ; run ;
I’m back to a job that only allows Windows on our laptops and desktops. Here’s how I configured my workstation to be more Linux-like in order to increase my productivity:
- Install Google Chrome and Firefox
- Install an antivirus or security suite (Norton or Mcafee?); a free one is Avast
- Map my caps lock key to control; if Admin access is not available, then use AutoHotKey by creating
Capslock::Controland creating a Startup shortcut
- Download Cygwin and install the following:
python(2 and 3),
topcommand + others),
aspell(I use flyspell in emacs),
/etc/email/email.confand enter correct server and credential information;
sendmailis not needed for sending outbound emails, only needed to send internal emails)
- Install the emacs binaries to
C:\Documents and Settings\my_username\bin\emacs-ver_num(Windows XP) or
C:\Users\my_username\bin\emacs-ver-num(Windows 7) and copy relevant image library dll files into the emacs binary directory in order for doc-view to work properly (eg, need
libpng14-14.dllfor emacs 24.3
- See Vincent Goulet’s emacs distro to see what dll files are needed)
- One could also use emacs w32 provided by cygwin, but ESS doesn’t seem to work because that version of emacs does not have the function
w32-short-file-namecompiled with it as needed by ESS; tentative solution can be found here, but it’s probably better to use the compiled emacs binaries available on GNU
- Set the
HOMEenvironment variable to
C:\Documents and Settings\my_usernameor
en_US.utf-8(for perl dbi error); set
- Edit environment variables by running the following in the command prompt:
rundll32 sysdm.cpl,EditEnvironmentVariables. Add the following to the
path_to_R;path_to_JRE;C:\Documents and Settings\my_username\My Documents\bin;C:\Documents and Settings\my_username\My Documents\bin\emacs-24.3\bin;C:\cygwin\bin. If we cannot edit System variables, then edit the user’s variables (eg,
path1;path2;%PATH%). If the settings aren’t saved for future sessions (eg, in Citrix), then create a symbolic link from
/home/user_idto the desired home (eg,
C:/Users/user_id), and add the following to
- Add the following to
- Fix the carriage return issue (only for Windows XP)
touch ~/.startxwinrcto prevent
xtermfrom launching whenever X server is started.
- Install R using the Windows installer (works with Emacs ESS); install R studio
sshfsto work on Windows to mount my servers
- Install Dropbox and symlink my
trampin emacs to work properly in order to visit remote servers easily in emacs by first getting the latest copy of tramp, then
configureand byte-compile the code (
make) per the proper installation. Add
pageant, and all
putty-related binaries into the PATH (
~/bin). After creating an ssh key, use the putty kegen to convert
id_rsa.ppk. Create a shortcut at startup that launches
pageant /path/to/id_rsa.ppk. Then in emacs, one could access remote files using tramp via
X Windowsshortcut to the Startup folder to automatically start it
export DISPLAY:0.0= to
- Python packages:
pip installdoes not),
- R packages:
- Perl packages (for edbi in emacs):
cpan -i DBD::ODBC(gcc4 error; edit Makefile and change “CC=gcc4″ to “gcc”; “LD=g++”)
- Set up ssh server via Cygwin and open up port 22 in Windows Firewall; freeSSHd is also an alternative
- On Windows 8, the user might not be able to change group permissions (eg, can’t ssh using keys because the key is “too open”); fix by changing files/directories group to ‘User’
- Use Autopatcher to install download all necessary updates and install them all at once
- Have the following shortcut in the startup folder in order to have a a terminal open up at startup with
C:\cygwin\bin\mintty.exe -e screen -s bash; in the shortcut, specify the home directory as the ‘Start In’ path
- Install UniKey for typing in Vietnamese (place in
~/Documents/bin), 7-Zip for handling archive files, and Virtual CloneDrive for handling disk image files
- Install CutePDF Writer (also download and install Ghostscript from CutePDF) for printing to PDF files
- Install Java Runtime Environment (JRE); if admin privileges aren’t available, then extract the files manually into
- Other tools per Lifehacker: VLC, PDF-XChange, Foxit PDF Viewer
- Configure sshd using
openssh, make sure it starts at startup (Start > Run > Services; look for
CYGWIN sshd), and allow
/usr/sbin/sshdto pass through the Firewall; one could alternatively use freeSSHd
- Bash shell in emacs via
(setq shell-file-name "bash")
(setq explicit-shell-file-name shell-file-name) to the emacs init file, and add the following to
# http://stackoverflow.com/questions/9670209/cygwin-bash-does-not-display-correctly-in-emacs-shell if [ "$EMACS" == "t" -a "$OSTYPE" == "cygwin" ]; then PS1='\u@\h \w\n$ ' fi
If Dropbox cannot be installed then symlink my
~/.emacs.d directory (need to use
mklink in order for symlink to work properly).
If the computer is a dual-boot with Linux installed first, then one can change the order of the bootloader to Windows by following these instructions.
This is a good post to review.
One can use
grep "mystring" myfile.ext to find the lines in
mystring. One could also use
grep "mystring" *.ext to find
mystring in all files with extension
ext. Similarly, one could use
grep "mystring" /directory to search for
mystring in all files in the directory. What if one wants to search for
mystring in all
*.ext files in a certain path
/directory? Most posts online would suggest something along the line of
<pre class="src src-sh">find /directory -type -f -name <span style="color: #ffa07a;">"*.ext"</span> | xargs grep <span style="color: #ffa07a;">"mystring"</span>
However, the comments of this post shows how one could do it with
<pre class="src src-sh">grep -r --include=*.ext <span style="color: #ffa07a;">"mystring"</span> /directory
Do the following:
## see which groups you belong to groups ## safely change primary group usermod -g NewPrimaryGroup -G oldgroup1,oldgroup2,oldgroup3 my.username
Native GUI client access to MS-SQL and MySQL
Overview of ODBC on Mac OS X
Mac OS X has iODBC installed as it’s default ODBC manager. Most other Linux/UNIX system uses unixODBC to manage the ODBC drivers. This is the main reason why there’s so much confusion on getting ODBC to work on Mac OS X.
ODBC is kind of like an API for any software to access any DBMS easily, regardless of what DBMS it is and what OS it’s running on. Different software (e.g., R or Python) can utilize ODBC to access different DBMS through the following logic: Software -> ODBC Manager -> ODBC Driver for the DBMS -> DBMS Server (Software: R, Python, etc.; DBMS: MySQL, MS-SQL, etc.).
It doesn’t matter whether you use iODBC or unixODBC. Whichever one you use, just make sure the DBMS Driver and software you are using are configured/compiled to use with the same ODBC manager (usually set through the configure flags). For example, the R package RODBC and Python package pyodbc are compiled by default to use iODBC on Mac OS X. The DBMS drivers used must be compiled for use with iODBC. For iODBC, one could add data source names (DSN’s) at
~/Library/ODBC/odbc.ini. For unixODBC, one could add DSN’s at
My current setup utilizes iODBC. I will outline the instructions for setting up MySQL and freeTDS (MS-SQL) drivers for use with RODBC and pyodbc through iODBC.
MySQL and FreeTDS with iODBC on Mac OS X
Install the MySQL Connector/ODBC driver. Driver should be at
/usr/local/lib/libmyodbc5w.so. Note: I’m unable to compile the driver from source on Mac OS X.
FreeTDS is an open source ODBC driver to access MS SQL Server. Install via Home Brew:
## install homebrew ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)" ## install freetds brew install freetds
Driver should be at
/usr/local/lib/libtdsodbc.so (symbolic linked).
[sqlserver01] Driver=/usr/local/lib/libtdsodbc.so TDS_Version=4.2 Server=ip.address Port = 1433 Trace = Yes Description=my description # Database= # can't specify username and password for freetds [mysql01] Driver=/usr/local/lib/libmyodbc5.so Server=hostname Port=3306 charset=UTF8 User=username Password=password # Database= ## can specify an actual database to each DSN
Install pyodbc via
sudo pip install pyodbc. Test connections in python:
import pyodbc as p con1 = p.connect("DSN=sqlserver01;UID=username;PWD=password") con1.execute("select name from master..sysdatabases").fetchall() con2 = p.connect("DSN=mysql01;UID=username;PWD=password") con2.execute("show databases;").fetchall()
Install R using the installer. Install RODBC in the R interpreter via
install.packages("RODBC"). Test connections in R:
library(RODBC) ch1 <- odbcConnect(dsn="sqlserver01", uid="username", pwd="password") odbcQuery(ch1, "select name from master..sysdatabases") odbcFetchRows(ch1) ch2 <- odbcConnect(dsn="mysql01", uid="username", pwd="password") odbcQuery(ch2, "show databases;") odbcFetchRows(ch2)
More on unixODBC on Mac OS X
If one wants to use unixODBC on Mac OS X instead, note the following:
- First install unixODBC via Homebrew with
brew install unixodbc.
- Compile R from source to have it work with unixODBC (R binaries from the installer uses iODBC by default).
- Can choose
--with-odbc-manager=odbcwhen compiling RODBC.
- When compiling freeTDS, include the argument
with-unixodbc(pass to Homebrew or when compiling manually).
- I’m unable to compile the MySQL Connector driver on Mac OS X from source (Homebrew or manually). Thus, it won’t work with unixODBC. I believe I tried unixODBC and MySQL Connector from macports, and those work.
- pyodbc only works with iODBC on Mac OS X (inspect setup file). Currently I can’t get pyodbc to work with unixODBC on Mac OS X.
More differences between unixODBC and iODBC
unixODBC comes with the
isql command to access different DBMS from the command line interpreter. iODBC comes with the
iodbctestw commands. The command
isql works for me on Mac OS X when I set freeTDS up to work with unixODBC (e.g., accessing MS SQL Server). I couldn’t access MySQL server because the MySQL Connector driver was compiled for use with iODBC.
If I use iODBC, I get the following for trying to access a MySQL server:
$ iodbctestw "DSN=sqlserver01;UID=username;PWD=password" iODBC Unicode Demonstration program This program shows an interactive SQL processor Driver Manager: 03.52.0607.1008 1: SQLDriverConnectW = [MySQL][ODBC 5.1 Driver]Prompting is not supported on this platform. Please provide all required connect information. (0) SQLSTATE=HY000 1: ODBC_Connect = [MySQL][ODBC 5.1 Driver]Prompting is not supported on this platform. Please provide all required connect information. (0) SQLSTATE=HY000
When I try to access SQL Server, I get
$ iodbctestw "DSN=sqlserver01;UID=username;PWD=password" iODBC Unicode Demonstration program This program shows an interactive SQL processor Driver Manager: 03.52.0607.1008 1: SQLDriverConnectW = [FreeTDS][SQL Server]Login failed for user 'username'. (18456) SQLSTATE=42000 2: SQLDriverConnectW = [FreeTDS][SQL Server]Unable to connect to data source (0) SQLSTATE=08001 1: ODBC_Connect = [FreeTDS][SQL Server]Login failed for user 'username'. (18456) SQLSTATE=42000 2: ODBC_Connect = [FreeTDS][SQL Server]Unable to connect to data source (0) SQLSTATE=08001
Don’t know why that is so. I guess it’s not too important to use an interactive interpreter. What matter is that the driver works with R and Python. Perhaps I should consider sqsh or do more searching.
Proc Import is great for importing a CSV or other delimited files:things just “work” most of the time. We don’t need to specify variable names, variable type, etc. However, data truncation or mis-matched variable type can happen as the procedure determines the data type and length of the variables based on the first few rows of the delimited file.
As this post suggests, one could use the
guessingrows=32767; statement in
Proc Import so SAS uses the first 32k rows to determine data type and length.
Alternatively, the safer solution would be to import the delimited file by using the
Data step and explicitly use the
length statement with a long length option to ensure that no truncation occurs (e.g.,
length my_var $100). One would also need to specify the data type with the
input statement here as well. Note: Do not specify the variable length using the
input statement here because SAS might read in characters from other fields as it starts reading from the last delimiter all the way to the character length.
Just wanted to note that for traditional SQL implementations (e.g., MySQL, MS-SQL), the
Group By statement used to aggregate a variable by certain variable(s) returns 1 row for each group. When a column that is not unique within a group is also selected, then the row that’s returned is determined somehow by the DB software.
In contrast, SAS’s
Proc SQL will return multiple rows for each group (the number of original rows), with the aggregated variable repeated for each row within a group. Here’s an example:
<pre class="src src-sas"><span style="color: #7fffd4;">data</span> foo ; <span style="color: #00ffff;">infile</span> datalines dlm=<span style="color: #ffa07a;">" "</span> ; <span style="color: #00ffff;">input</span> name $ week $ sales ; datalines ;
bob 1 20000 bob 2 30000 jane 1 40000 jane 2 50000 mike 1 60000 mike 2 70000 kevin 1 80000 kevin 2 90000 ; run ;
proc sql ; create table foo_agg as select a.name , a.week , sum(a.sales) as total_sales from foo as a group by name ; quit ; run ;
proc export data=foo_agg outfile=“foo_agg.csv” DBMS=csv REPLACE ; run ;
The content of
foo_agg.csv looks like
bob,2,50000 bob,1,50000 jane,1,90000 jane,2,90000 kevin,1,170000 kevin,2,170000 mike,2,130000 mike,1,130000
An analogous return from the SQL code in MySQL or MS-SQL might look something like
name,week,total_sales bob,2,50000 jane,1,90000 kevin,1,170000 mike,2,130000
Proc SQL, one would need to use the
Select Distinct statement in order to remove the duplicate rows.
Note that when combining the
Group By statement with a
Join, these multiple records per group still hold.
SAS’s implementation is not necessarily bad as it gives the user’s more flexibility in returning an aggregated variable with every row without re-joining the aggregated table with the original table. The user just has to remember this behavior ;).
I wanted to use
%sysexec to execute a shell command with an asterisk (shell wildcard for globbing) in a SAS program:
<pre class="src src-sas"><span style="color: #7fffd4;">%sysexec</span> cp /tmp/foo<span style="color: #ff4500;">/*</span><span style="color: #ff4500;">.txt /tmp/bar ;</span>
However, it wasn’t giving me the desired results, probably due to the
/* characters as they begin a commented section in a SAS program. Also tried escaping the asterisk with
\* and surrounding the shell command with quotes but I didn’t get any luck. Emailed the SAS-L community for help and discovered the
call system statements in SAS. The following works:
<pre class="src src-sas">x <span style="color: #ffa07a;">"cp /tmp/foo/*.txt /tmp/bar"</span> ;
/ or / data null ; call system(“cp /tmp/foo/*.txt /tmp/bar”) ; run ;
More information on executing shell commands in a SAS program can be found here.