SAS Proc SQL Group By returns multiple rows per group

Just wanted to note that for traditional SQL implementations (e.g., MySQL, MS-SQL), the Group By statement used to aggregate a variable by certain variable(s) returns 1 row for each group. When a column that is not unique within a group is also selected, then the row that’s returned is determined somehow by the DB software.

In contrast, SAS’s Proc SQL will return multiple rows for each group (the number of original rows), with the aggregated variable repeated for each row within a group. Here’s an example:

<pre class="src src-sas"><span style="color: #7fffd4;">data</span> foo ;
<span style="color: #00ffff;">infile</span> datalines dlm=<span style="color: #ffa07a;">" "</span> ;
<span style="color: #00ffff;">input</span> name $ week $ sales ;
datalines ;

bob 1 20000 bob 2 30000 jane 1 40000 jane 2 50000 mike 1 60000 mike 2 70000 kevin 1 80000 kevin 2 90000 ; run ;

proc sql ; create table foo_agg as select a.name , a.week , sum(a.sales) as total_sales from foo as a group by name ; quit ; run ;

proc export data=foo_agg outfile=“foo_agg.csv” DBMS=csv REPLACE ; run ;

The content of foo_agg.csv looks like

<pre class="example">name,week,total_sales

bob,2,50000 bob,1,50000 jane,1,90000 jane,2,90000 kevin,1,170000 kevin,2,170000 mike,2,130000 mike,1,130000

An analogous return from the SQL code in MySQL or MS-SQL might look something like

name,week,total_sales
bob,2,50000
jane,1,90000
kevin,1,170000
mike,2,130000

In SAS’s Proc SQL, one would need to use the Select Distinct statement in order to remove the duplicate rows.

Note that when combining the Group By statement with a Join, these multiple records per group still hold.

SAS’s implementation is not necessarily bad as it gives the user’s more flexibility in returning an aggregated variable with every row without re-joining the aggregated table with the original table. The user just has to remember this behavior ;).

About Vinh Nguyen

Statistician

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>