
Preprocess GWAS Summary Statistics for MRcare
preprocess_gwas_data.RdThis function preprocesses GWAS summary statistics to prepare them for use with MRcare. It standardizes column names, performs quality control, and saves the processed data to a file for later use.
Usage
preprocess_gwas_data(
gwas_data,
output_file,
data_type = c("exposure", "outcome"),
A1 = NULL,
A2 = NULL,
SNP = NULL,
CHR = NULL,
POS = NULL,
BETA = NULL,
SE = NULL,
Pval = NULL,
MAF = NULL,
N = NULL,
Zscore = NULL,
Ninput = NULL,
maf_filter = 0.01,
remove_ambiguous = TRUE,
verbose = TRUE
)Arguments
- gwas_data
Path to a GWAS summary statistics file or a data frame containing GWAS summary statistics
- output_file
Path where the processed GWAS data will be saved
- data_type
Either "exposure" or "outcome" to determine processing logic
- A1
Column name for effect allele (default: NULL, will try to detect)
- A2
Column name for other allele (default: NULL, will try to detect)
- SNP
Column name for SNP ID (default: NULL, will try to detect)
- CHR
Column name for chromosome (default: NULL, will try to detect)
- POS
Column name for position (default: NULL, will try to detect)
- BETA
Column name for effect size (default: NULL, will try to detect)
- SE
Column name for standard error (default: NULL, will try to detect)
- Pval
Column name for p-value (default: NULL, will try to detect)
- MAF
Column name for minor allele frequency (default: NULL, will try to detect)
- N
Column name for sample size (default: NULL, will try to detect)
- Zscore
Column name for Z-score (default: NULL, will try to detect)
- Ninput
Sample size if not available in data (default: NULL)
- maf_filter
Minimum MAF threshold to include SNPs (default: 0.01)
- remove_ambiguous
Whether to remove strand-ambiguous SNPs (default: TRUE)
- verbose
Whether to print progress messages (default: TRUE)
Examples
if (FALSE) { # \dontrun{
# Process exposure data
exposure_processed <- preprocess_gwas_data(
gwas_data = "path/to/raw_exposure_gwas.txt",
output_file = "path/to/processed_exposure_gwas.rds",
data_type = "exposure",
Ninput = 10000
)
# Process outcome data
outcome_processed <- preprocess_gwas_data(
gwas_data = "path/to/raw_outcome_gwas.txt",
output_file = "path/to/processed_outcome_gwas.rds",
data_type = "outcome",
Ninput = 15000
)
} # }