Data Dictionary
A data dictionary contains important information about variables and data entered into databases
Researchers generate data when conducting research. The database serves as a repository for study data and as a place to manipulate data.
When databases are created during the planning stages of a research study, variable names are created, the codifications of categorical and ordinal variables are chosen, and the scales of measurement for all demographic, predictor, confounding, and outcome variables are specified. The data dictionary serves as a repository for this type of study data. Data dictionaries are created concurrently with databases and provide important information about the variables and data entered into databases.
The data dictionary contains information about each variable and the data that is being collected for each variable. For each variable in a database, researchers should enter the following information into a data dictionary:
1. The name of the variable
2. The codifications or levels of each categorical or ordinal variable
3. The scale of measurement for the variable
When databases are created during the planning stages of a research study, variable names are created, the codifications of categorical and ordinal variables are chosen, and the scales of measurement for all demographic, predictor, confounding, and outcome variables are specified. The data dictionary serves as a repository for this type of study data. Data dictionaries are created concurrently with databases and provide important information about the variables and data entered into databases.
The data dictionary contains information about each variable and the data that is being collected for each variable. For each variable in a database, researchers should enter the following information into a data dictionary:
1. The name of the variable
2. The codifications or levels of each categorical or ordinal variable
3. The scale of measurement for the variable
Data dictionary examples
For example, if researchers wanted to collect a categorical demographic variable like "Gender," then they could enter the following into a data dictionary:
Gender - 0 = male and 1 = female, categorical
Gender is the name of the variable, "0 = male and 1 = female" are the codifications for both levels, and the variable is measured at a categorical level.
Another example would be for a Likert-type variable:
Satisfaction - 1 = Strongly Dissatisfied, 2 = Dissatisfied, 3 = Neither Satisfied or Dissatisfied, 4 = Satisfied, and 5 = Strongly Satisfied, ordinal
Satisfaction is the name of the variable, the numerical headings associated with each level is presented, and the variable is measured at an ordinal level.
Finally, an example of a continuous variable:
BMI - Enter the BMI value, continuous
BMI is the name of the variable, the continuous value of BMI is entered, and the variable is measured at a continuous level.
Gender - 0 = male and 1 = female, categorical
Gender is the name of the variable, "0 = male and 1 = female" are the codifications for both levels, and the variable is measured at a categorical level.
Another example would be for a Likert-type variable:
Satisfaction - 1 = Strongly Dissatisfied, 2 = Dissatisfied, 3 = Neither Satisfied or Dissatisfied, 4 = Satisfied, and 5 = Strongly Satisfied, ordinal
Satisfaction is the name of the variable, the numerical headings associated with each level is presented, and the variable is measured at an ordinal level.
Finally, an example of a continuous variable:
BMI - Enter the BMI value, continuous
BMI is the name of the variable, the continuous value of BMI is entered, and the variable is measured at a continuous level.
Importance of a data dictionary
The importance of the data dictionary cannot be overlooked when creating a database. It contains the language or lexicon by which researchers communicate to each other when conducting a study. The data dictionary should be created concurrently with the database so that there is logical flow and consistency between them. If any changes are ever made to a database or to a data dictionary, then all members of a research team should be made aware. If these changes are not disseminated to the entire team, then researchers may enter data using different codification schemes. This DRASTICALLY affects the validity and credibility of research data.
Creating an objective and standardized methodology for data entry is EXTREMELY important in the beginning stages of planning a study. Communication and collaboration are also of paramount importance when planning data collection.
Creating an objective and standardized methodology for data entry is EXTREMELY important in the beginning stages of planning a study. Communication and collaboration are also of paramount importance when planning data collection.
Click on the Download Data Dictionary button to download a code book template for all databases available in Research Engineer.
Hire A Statistician
DO YOU NEED TO HIRE A STATISTICIAN?
Eric Heidel, Ph.D., PStat will provide you with statistical consultation services for your research project at $100/hour. Secure checkout is available with Stripe, Venmo, Zelle, or PayPal.
- Statistical Analysis on any kind of project
- Dissertation and Thesis Projects
- DNP Capstone Projects
- Clinical Trials
- Analysis of Survey Data