Much Ado About Nothing: dealing with blank values in your data

Ron Walker
None


Abstract

Introduction

This paper is intended as a brief survey of techniques for preventing unwanted blank values when reading in raw data, identifying and handling blank values in existing datasets, and methods for creating blank values in a dataset. It will also briefly describe some of the challenges that blank values can present to downstream programming and data analytics.

Blank Values

A common source of unwanted blank values occurs as invalid numeric values (typically dates) in raw data are converted in SAS as a blank. Programming techniques can be deployed to identify these values.

Blank values can cause problems with assignment statements, counts, calculations and data modeling. The good news is that SAS has a variety of procedures and options for identifying and handling blank values in datasets.

On the flip side, other programs use values such as NA or 999 to identify missing data, which need to be converted to blank in SAS. These “wanted” blanks can be easily created using basic SAS programming techniques.

Summary

Blank values in data are not necessarily bad if the programmer is aware of them and if the meaning of “blank” is clear to all users. For example, “I took 20 shots and made zero” is quite different than “I took 20 shots and made _.” I recommend a blank value only signify data which was not recorded or captured.