SAS Proc SQL Group By returns multiple rows per group

Just wanted to note that for traditional SQL implementations (e.g., MySQL, MS-SQL), the Group By statement used to aggregate a variable by certain variable(s) returns 1 row for each group. When a column that is not unique within a group is also selected, then the row that’s returned is determined somehow by the DB software.

In contrast, SAS’s Proc SQL will return multiple rows for each group (the number of original rows), with the aggregated variable repeated for each row within a group. Here’s an example:

data foo ;
    infile datalines dlm=" " ;
    input name $ week $ sales ;
    datalines ;
bob 1 20000
bob 2 30000
jane 1 40000
jane 2 50000
mike 1 60000
mike 2 70000
kevin 1 80000
kevin 2 90000
    ;
run ;

proc sql ;
    create table foo_agg as
        select
        a.name
        , a.week
        , sum(a.sales) as total_sales
        from foo as a
        group by name
        ;
    quit ;
run ;

proc export data=foo_agg
    outfile="foo_agg.csv"
    DBMS=csv
    REPLACE
    ;
run ;

The content of foo_agg.csv looks like

name,week,total_sales
bob,2,50000
bob,1,50000
jane,1,90000
jane,2,90000
kevin,1,170000
kevin,2,170000
mike,2,130000
mike,1,130000

An analogous return from the SQL code in MySQL or MS-SQL might look something like

name,week,total_sales
bob,2,50000
jane,1,90000
kevin,1,170000
mike,2,130000

In SAS’s Proc SQL, one would need to use the Select Distinct statement in order to remove the duplicate rows.

Note that when combining the Group By statement with a Join, these multiple records per group still hold.

SAS’s implementation is not necessarily bad as it gives the user’s more flexibility in returning an aggregated variable with every row without re-joining the aggregated table with the original table. The user just has to remember this behavior ;).

Execute shell commands with an asterisk in SAS

I wanted to use %sysexec to execute a shell command with an asterisk (shell wildcard for globbing) in a SAS program:

%sysexec cp /tmp/foo/*.txt /tmp/bar ;

However, it wasn’t giving me the desired results, probably due to the /* characters as they begin a commented section in a SAS program. Also tried escaping the asterisk with \* and surrounding the shell command with quotes but I didn’t get any luck. Emailed the SAS-L community for help and discovered the x and call system statements in SAS. The following works:

x "cp /tmp/foo/*.txt /tmp/bar" ;
/* or */
data _null_ ;
  call system("cp /tmp/foo/*.txt /tmp/bar") ;
run ;

More information on executing shell commands in a SAS program can be found here.

Screen brightness after suspend in Ubuntu

Many laptops have their screens dimmed after returning from “suspend” and cannot get back to their original brightness. The bug hasn’t been fixed for 3 years. A fix is provided in the bug report by putting something like the following in /etc/rc.local:

brt=`cat /sys/devices/virtual/backlight/acpi_video0/brightness`
abrt=`cat /sys/devices/virtual/backlight/acpi_video0/actual_brightness`
if (( $brt != $abrt )) ; then
echo $abrt > /sys/devices/virtual/backlight/acpi_video0/brightness
fi

Use the following

find /sys/ -iname '*bright*'

to see if you need to change the exact path to the files.

Better decision tree graphics for rpart via party and partykit

I’ve been using Graphviz to create better decision tree graphics “by hand” for rpart objects created in R (final tree). I stumbled on this post that shows how one could convert an rpart object to a party project via the as.party function in partykit to utilize the plot functions in party. It looks quite nice.

I might have to do additional hacking as I like to display the node size and percentage of success in every node. For example, in rpart, I do something like

## rpartObj created from rpart
textRpartCustom <- 
{
    nclass <- (ncol(yval) - 1L)/2
    group <- yval[, 1L]
    counts <- yval[, 1L + (1L:nclass)]
    if (!is.null(ylevel)) 
        group <- ylevel[group]
    temp1 <- rpart:::formatg(counts, digits)
    if (nclass > 1) {
        ## temp1 <- apply(matrix(temp1, ncol = nclass), 1, paste, 
        ##     collapse = "/")
      temp1 <- matrix(as.numeric(temp1), ncol=nclass)
      ##temp1 <- paste("p=", round(temp1[, 2] / apply(temp1, 1, sum)*100, 1), "%", "; N=", apply(temp1, 1, sum), sep="")
      temp1 <- paste("", round(temp1[, 2] / apply(temp1, 1, sum)*100, 1), "%", "; ", apply(temp1, 1, sum), sep="") 
    }
    if (use.n) {
        out <- paste(format(group, justify = "left"), "\n", temp1, 
            sep = "")
    }
    else {
        out <- format(group, justify = "left")
    }
    return(out)
}

rpartObj$functions$text <- textRpartCustom
plot(rpartObj)
text(rpartObj)

to get these information to display for a classification fit.