Skip to contents

This function serves for step 2 of chapter 1 of the "Using VCFtoGWAS package" markdown series.
Based on an array of specific strains, the genotype matrix is filtered to only contain those strains.
Afterwards, Rows that now contain irrelevant SNPs (that add no information) are omitted.

Usage

Filter_genotypes(genotype_matrix,
                 fixed_data,
                 strains = "",
                 dir_results = getwd(),
                 results_name = name_by_time(),
                 do_save = TRUE,
                 on_columns = TRUE,
                 filter_zeros = TRUE)

Arguments

genotype_matrix

Gentoype matrix where rows are SNPs (variants) and columns are strains (genotypes) as was created in step 1 by Upload_vcf_to_R

fixed_data

The fixed information extracted from the VCF (after step 1)

strains

An array of the strains that you wish to keep (if empty (default), all are kept)

dir_results

he directory in which a folder will be created and results will be saved. Make sure it exists!!!

results_name

The name of the folder in which the results will be saved within dir_results (default is a time stamp, see create_directory)

do_save

Do you wish to save the results? (will be saved as RDS files) (Default is TRUE)

on_columns

all actions assume that the strain names appear in the column names. If for some reason the entered matrix is transposed (strains in row names), than this parameter should be given as FALSE (default is TRUE).

filter_zeros

TRUE by default. The function filters variants that are all NA. If this parameter is TRUE, it also filters variant rows for which the variant doesn't appear in any of the genotypes.

Details

More on the use of this function can be found at the relevant section in chapter 1.

Value

fix_filt

A dataframe of the fixed information. All rows have at least one "useful" value

gt_GTonly_filt

A matrix with the desired strains (after filtration)

Author

Tomer Antman