Filter the genotypes (strains)
Filter_genotypes.Rd
This function serves for step 2 of chapter 1 of the "Using VCFtoGWAS package" markdown series.
Based on an array of specific strains, the genotype matrix is filtered to only contain those strains.
Afterwards, Rows that now contain irrelevant SNPs (that add no information) are omitted.
Usage
Filter_genotypes(genotype_matrix,
fixed_data,
strains = "",
dir_results = getwd(),
results_name = name_by_time(),
do_save = TRUE,
on_columns = TRUE,
filter_zeros = TRUE)
Arguments
- genotype_matrix
Gentoype matrix where rows are SNPs (variants) and columns are strains (genotypes) as was created in step 1 by
Upload_vcf_to_R
- fixed_data
The fixed information extracted from the VCF (after step 1)
- strains
An array of the strains that you wish to keep (if empty (default), all are kept)
- dir_results
he directory in which a folder will be created and results will be saved. Make sure it exists!!!
- results_name
The name of the folder in which the results will be saved within dir_results (default is a time stamp, see
create_directory
)- do_save
Do you wish to save the results? (will be saved as RDS files) (Default is
TRUE
)- on_columns
all actions assume that the strain names appear in the column names. If for some reason the entered matrix is transposed (strains in row names), than this parameter should be given as
FALSE
(default isTRUE
).- filter_zeros
TRUE by default. The function filters variants that are all NA. If this parameter is TRUE, it also filters variant rows for which the variant doesn't appear in any of the genotypes.
Details
More on the use of this function can be found at the relevant section in chapter 1.