The overall goal of the IHS Facility Core is to facilitate both patient-oriented and population-based research within the Center. It promotes the strategic goals of the CEHNM by supporting existing research projects and fostering new interdisciplinary activities focusing on environmental health problems. The Core provides the infrastructure to carry out these studies and has several components including epidemiologic, data management and biostatistical capabilities and a Biomarkers Laboratory for sample processing, storing, distribution and analysis. The Core also seeks to address the needs of CEHNM investigators as their research focus changes. The addition of pyrosequencing and high throughput genotyping methods was the result of member requests, as was the development of oxidative stress assays and enhancement of data management capabilities. The Biomarkers Laboratory also develops and validates new assays to meet the changing needs of CEHNM investigators.
In addition, the Core offers pro-bono or low-cost services to new CEHNM pilot projects. Training of students and postdoctoral fellows is an important component of Core activities, allowing methodologies to be exported to other laboratories, providing access to equipment and the expertise necessary to conduct studies requiring DNA adduct detection, measurement of oxidative stress markers, genotyping and other immunoassays of interest. The Core provides extremely efficient, cost-effective cutting-edge web-based and other database management systems and provides data management consulting, programming and an extensive, secure, hardware/network infrastructure. The Core has also expanded to include a major formal educational component. Finally, the IHS Facility Core works with the Community Outreach and Engagement Core (COEC) to disseminate research results to the surrounding community.
jlt12 [at] columbia.edu (Seamus Thompson), PhD
prf1 [at] columbia.edu (Pam Factor-Litvak), PhD
Biomarkers Laboratory Director:
rps1 [at] columbia.edu (Regina M. Santella), PhD
xl26 [at] columbia.edu (Xinhua Liu), PhD
Biomarkers: ig112 [at] columbia.edu (Irina Gurvich), MS
Data: rb539 [at] columbia.edu (Richard Buchsbaum), MS
Description of Services
Study Design/Data Analysis
Initial study design consultation
This focuses on establishing testable aims and hypotheses and incorporating both existing and new study design methods. It may include the use of directed acyclic graphs (DAGS) to incorporate all relevant variables, latent and instrumental variable analysis and propensity scoring. Such methods must be incorporated in the planning stage of a study and are especially important for environmental issues where exposure sometimes is poorly characterized (i.e. use of instrumental and latent variables) and potentially confounding variables may be illusive (i.e. use of DAGS). Further, especially when studying complex diseases where multiple exposures, genetic variants and epigenetic changes may be part of the causal framework, attention should be paid to possible modification and mediation models.
Data collection consultation
This includes strategies for data collection, i.e. the appropriate method (in person, telephone, mail), operationalizing variables, and incorporating multi-level determinants, such as social and neighborhood characteristics and molecular biomarkers. It may also include strategies for the creation of scales. Data analysis and interpretation consultation includes construction of appropriate conceptual and statistical models to test the hypotheses, and strategies to evaluate confounding, effect modification and mediation. It may also include options for data presentation.
Sophisticated data collection, storage and quality control techniques are available and capable of greatly enhancing the efficiency and scientific precision of research projects. Ms. Levy’s technical capacity extends to database administration (which includes table design and implementation as well as querying and dataset creation), graphical user interface programmer, web programmer and go-to person for all data requirements.
Consultation on data collection plan/ design of collection instruments
Investigators are becoming increasingly interested in alternatives to the process of collecting data on questionnaires and subsequently entering them into a database system. By consulting during the early stages of a project, investigators can plan highly efficient systems. The Core ensures that proposed questionnaires are consistent and free of ambiguities. Early consultation in the form design process allows core members to steer investigators toward designs that lend themselves to efficient data collection.
Design and programming of relational database
We are committed to using the relational database structure to collect and store all research data, independent of where they originated. Data collected from questionnaires, laboratories, and electronic equipment are stored in separate tables that are related to each other by a key identifier, usually the Subject ID. Attempts to use spreadsheets and/or statistical software packages to collect and store data are ill advised as they often push the limitations of those tools, thus compromising the data integrity and security. The relational database system is based on rigorous standards; data tables are designed according to standardized rules (known as normalization). Database administrators are highly trained to adhere to these standards and to use the available tools to develop a fine tuned database that, on its own, incorporates data rules as well as rules specific to the research project. Accordingly, these rules do not have to be programmed into a separate application; the database is then portable because it stands on its own. It can be shared (based on security permissions and HIPAA regulations), archived and then resurrected, and/or expanded as necessary.
Innovative Data Entry Systems
A graphical user interface system is created in order to allow data entry for projects that include questionnaires. Depending upon the size of the questionnaire, geographical location of staff, and personnel availability due to concurrent projects, decisions are made early in the project regarding which technologies to use and where and at which stage in a project data entry will occur. Data entry systems are programmed to duplicate the paper questionnaires and to include intricate validation routines and skip patterns. These attributes of a data entry system contribute towards keeping data entry errors low and data as clean as possible. Management of the data entry process is provided as needed. Often projects are designed to collect data from electronic apparatus, such as air monitoring equipment. The data management unit has worked collaboratively with investigators to design and program automatic data import routines.
Creation of Datasets for Statistical Analysis
Datasets are created for investigators in the format they request (i.e.: SAS, Microsoft Excel, Microsoft Access, text files, etc.) Ms. Levy works closely with biostatisticians to determine which variables are required for the analysis being performed. This allows smaller (targeted) datasets to be distributed as needed. Statisticians favor these datasets since less time is spent by the statistician on tasks such as merging, calculating date ranges, and consulting code books.
Backup and Security: Storage in a central location, on a secure server, ensures that the data are backed up daily and that access is available only to authorized users with individual passwords. Our systems use Windows integrated security, which ensures that user IDs and passwords are encrypted and secure when they are sent over the network.
Processing and Storage of Biological Samples
The Core provides initial consultation on all aspects of sample collection and processing and standard operating procedures are established for each study. Samples are coded to maintain confidentiality using preprinted bar code labels. Blood samples are fractionated according to study-unique protocols with each vial of each fraction coded for tracking of individual vials and their freeze/thaw cycles. The Core also processes and stores urine and oral/saliva samples. Whenever possible, aliquots of the various materials are stored and available for subsequent new types of assays that may become of interest. A list of cohorts with available biospecimens can be found here.
Isolation of DNA/RNA
DNA is isolated from cells, tissue and plasma using various protocols, primarily Qiagen kits or salting out with routine quality control determination of the 260/280 ratios. Data are directly sent to an Excel file for determination of concentration and purity. DNA can also be isolated from the filter cards and cross-checked for microsatellite repeats with the DNA from aliquoted cells as an identity check. Stock DNAs are kept in vials but can be prepared in 96 deep well storage plates. The database contains information on total yield of DNA, A260/280 ratios, number of aliquots on hand, number shipped and recipient and remaining DNA stock. Whole genome amplification is also available as is RNA extraction.
The Core provides analytical support including measurement of DNA adducts, oxidative stress markers and other analytes by ELISA, DNA methylation using MethyLight, LUMA, or pyrosequencing and SNP genotyping. PAH-DNA and protein adducts, protein carbonyls and urinary 8-oxodG, isoprostanes, PAH and aflatoxins are available ELISAs. But any ELISA available in a commercial kit can and has been run by the Core (e.g., G-CSF, GM-CSF, C reactive protein). Consulting on genotyping platforms is provided with studies of a limited number of SNPs being carried out within the Core using TaqMan assays in 384-well format or the Biotrove nanowell TaqMan system for 32 SNPs. Larger studies are referred to other on campus facilities where Sequenom and Illumina instrumentation are available. The IHSFC provides a mechanism by which investigators can have samples analyzed. However, it also teaches investigators how to analyze the samples in their own laboratory or using Core equipment, making available to them reagents/instrumentation. It instructs on the importance of assay-specific quality controls (e.g., pooled and duplicate samples) in all assays. While currently running at maximum capacity, additional personnel are frequently hired for specific large scale studies of long duration.
Biorepository Data Management
Samples are identified by unique sample IDs that allow linkage of the sample inventory with the information available from questionnaires, surveys, etc. After a sample is processed, aliquoted and stored, a paper record with an attached sample bar code records date collected and processed, who processed the sample, the sample amount and how many of each type of aliquot were made using a preset form specific for the particular study. There is an on-line data entry system for bulk recording of new sample storage that automatically assigns sample location. Information in the database includes the study identifiers, sample ID, aliquot type, volume, location (freezer, shelf, rack, box, and position in box), date received/processed and the technician. The database stores information on the results of DNA extraction, all use and shipment of specimens and employs a number of internal checks to ensure data integrity. The system uses an interface created in Microsoft Access for dynamic querying and searches, automatic entry of default values, and automation of routine tasks. Data are stored permanently in an MS SQL Server 2005 database, which enforces quality constraints and referential integrity. Access to the data is via an encrypted HTTP (internet) connection, and access is provided for authorized users with valid passwords only. A separate web-based system was developed to store research results. The database stores the methods used, person generating the data and test results. The system allows multiple users to have simultaneous, secure access to the system. The system currently stores information on approximately 300,000 unique biospecimen aliquots. The IHSFC currently stores biospecimens from over 75 studies.