Data
Processing and Management
Computer-Assisted Data Entry
CATI technology may also be used as a method for entering data
from source documents (e.g., self-administered questionnaires,
records, and interview forms) directly onto computer files. Often
it is possible to combine coding, data entry, and data cleaning
into a single operation, bypassing the traditional paper-and-pencil
coding and data entry procedures. Computer-assisted data entry
on SRC's UNIX operating system has proven to be less costly than
the traditional methods, and it generally results in more rapid
and efficient production of data files and tapes.
For
more information on computer-assisted data entry, contact Robert
L. McCarthy, Manager of Data Services, (510) 642-6596, bobmc@berkeley.edu.
Coding
The Center's data management staff prepares and edits completed
interviews, questionnaires, and related documents for computer
processing. Its services are available to investigators making
separate arrangements for fieldwork as well as to those utilizing
the Center's field staff. These services include:
- laying
out and precoding of data collection instruments prior to printing
in order to facilitate efficient processing;
- editing
and coding of both check-list and open-ended questions and related
documents in preparation for data entry;
- coding
of occupations, industries, and geographic areas to U.S. Census
classifications;
- constructing
codes for open-ended survey questions, depth interviews, and
other qualitative materials;
- preparing
study codebooks and variable labels to facilitate analysis;
and
- identifying
and correcting data errors by reference to source documents.
Since coding costs vary greatly depending on the layout and extent
of precoding of survey instruments, clients are urged to consult
with the coding staff prior to final drafting of their data collection
instruments.
For
more information on coding, contact Robert L. McCarthy, Manager
of Data Services, (510) 642-6596, bobmc@berkeley.edu.
Data Cleaning, File Construction
& Codebook Preparation
Once data have been entered into computer files, SRC staff can assist
users in checking, cleaning, recoding, and labeling the data prior
to analysis. Such assistance may include the following:
- running
edit programs to check for identification numbers, to identify
wild codes, and to resolve inconsistencies;
- preparing
analysis files with appropriate variable and category labels,
missing data specifications, and weights;
- creating
new variables by arithmetic, logical, scaling, and recoding
operations;
- combining
data from linked or nested files into composite files, e.g.,
assigning Census tract-level variables to households or individuals
within tracts;
- updating
master files and producing routine reports, summary descriptive
statistics, and preliminary tabulations; and
- preparing
full documentation on the structure, content, and marginal distributions
of datasets, including Microsoft Word and PDF versions of codebooks.
The final delivery products each client should expect is: (1)
a clean, raw data file in the format desired; (2) a data definition
file describing the raw data; and (3) a codebook containing variable
names, titles, category labels, and question text. Typically,
these codebooks also depict the frequency distribution of all
variables as well as summary statistics. Our capability of creating
codebooks also provides us with a useful byproduct, that of preparing
data definition files for three of the more popular statistical
packages, i.e., SAS, SPSS and STATA. Clients should be able to
commit themselves to analysis once receiving these deliverables.
Codebooks and data files may also be used as interim reports.
Our staff will prepare and deliver by the requested dates.
For further information on data cleaning, file construction, and
codebook preparation, contact Robert L. McCarthy, Manager of
Data Services, (510) 642-6596, bobmc@csm.berkeley.edu.
Data Analysis and Computing
Services
Staff members can assist clients in the statistical analysis of
their data. In the planning stages of a study, staff members can
help frame analysis strategies to guide the choice of data collection
procedures. Once data are ready for analysis, staff members can
advise on or carry out specific analyses, including the following:
-
data reduction techniques to build indices for subsequent study,
e.g., scale construction, clustering, and factor analysis;
-
tabulation techniques for both investigation and final presentation;
-
graphic techniques for picturing univariate distributions, time
series, geographic occurrence, and more complex data structures,
e.g., bar charts, scatter plots, and maps;
-
estimation of standard errors for sample statistics from complex
samples; and
-
multivariate techniques for quantitative and categorical variables,
e.g., linear regression and analysis of variance, log-linear
analysis of cross-classifications, logistic regression for binary
and categorical dependent variables, and structural equation
estimation.
Experience
with a wide variety of packaged computer programs enables Center
staff to help link clients' analytic goals to the capabilities
of available hardware and software. In conjunction with data analysis
work, Center staff can also prepare written technical summaries
of methods used or full reports on study results.
For further information on data analysis services offered by the
Survey Services Facility, contact Thomas L. Piazza, Senior
Survey Statistician, piazza@csm.berkeley.edu
or Yuteh Cheng, Manager of Technical Services, yuteh@hobbes.berkeley.edu.
Last modified: 4 February 2008