Output data to Excel for reproducible post-hoc analysis or visualization

As much as I like to analyze and visualize data in R, I sometimes have the need to export results/data into Excel for my business partners or myself to consume in Excel or Powerpoint (eg, create custom/edit-able bar charts with various graphical overlays in a powerpoint slide). As I’ve been using the XLConnect package to read xls/xlsx files, I’m also using it to write data to a sheet in an Excel workbook. I write the necessary data out to the Data sheet:

## read data from files/DB and manipulate to get to the final data set
## now, write data
writeWorksheetToFile(file='foo.xlsx', data=iris, sheet='Data', clearSheets=TRUE) ## in case number of rows is less than before

I ask my collaborators to not edit the Data sheet except adding filters or sorting when they are inspecting/eye-balling the data. I ask them to do all analysis in separate sheets. The reason for this is to keep the work reproducible in case the data needs to be refreshed (error in data, repeat the analysis on new data, etc). That is, when I need to refresh the data, I ask for the modified Excel workbook and write out refreshed data to the same sheet (hence the clearSheets=TRUE option in case the number of rows is less than before). That way, calculations or plots referencing columns in the Data sheet would automatically be refreshed in the workbook with the refreshed data.

This is just another way to prevent inefficiency in the work flow and allows for reproducibility even when collaborating with Excel workbooks.

This work flow should in theory also work with SAS:

proc export data=mydata outfile='foo.xlsx'
    dbms=xlsx ;
    sheet=Data ;
run ;

About Vinh Nguyen


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>