‘Tis better to give than receive: considerations when sharing data

Melissa R. Pfeiffer
Children’s Hospital of Philadelphia


Abstract

Data exchanges are increasingly common and are critical in propelling advancements in science and industry. However, these exchanges are not always seamless. Issues can arise both when receiving data from others and when distributing data, although the latter is usually easier. The aim of this paper is to provide guidance for both situations. For receiving data, we will discuss the importance of trying to obtain basic information about the data, such as a data dictionary or file layout with information about the field names, types, lengths, and meanings. At a bare minimum, recipients should ask about the number of records expected to confirm what is imported into SAS. We will discuss possible issues that may occur when importing non-SAS data into SAS and solutions. Finally, we will review some checks that can be conducted to confirm that files received are structurally sound. For distributing data, we will discuss strategies for creating non-SAS data, depending on what file type the recipient needs (e.g., Excel or CSV), whether column headings should be variable names or labels, and whether values should be formatted or unformatted. To help promote our reputations for giving generously, we will talk about documentation that should accompany the data you provide, such as a data dictionary – either simple or more detailed – and variable format information, when applicable. Collaborations and data sharing should be reasons for celebration rather than trepidation; the aim of this paper is to make these processes a little easier.