Prediction of Breeding Values Using MiX99
qzwslrm_ebv_mix99.Rmd
MiX99
Available versions of MiX99 are linked into the directory
/qualstorzws01/data_projekte/linuxBin
. The latest version
are available with the suffix pro
attached to each of the
commands. From the examples directory, the way how different analyses
are run can be seen. The basic commands are
mix99i_pro AM.clm > mix99i.log
mix99s_pro -s > mix99s.log
where the required parameters are specified in the so-called CLIM
file AM.clm
.
Example Data
As described in a separate vignette on Generating Test Data an example dataset is simulated using QMSim. The resulting simulation output is prepared to be analysed by MiX99. In order to have a simple pipeline for the analysis, the input data is copied into a working directory and from there the analysis is started.
Prepare MiX99 Input Files
This section shows the manual preparation of the MiX99 input files.
This can easier be done by using the package
qzwsmix99rt
.
s_mix99_dir <- system.file("extdata", "mix99", "p1_example", package = "qzwslrm")
list.files(path = s_mix99_dir)
#> [1] "p1_data_001.clm" "p1_data_001.dat" "p1_data_001.ped" "p1_data_001.var"
The content of the directory
/tmp/RtmpQ6Sl2w/temp_libpath1e0d2c67f2/qzwslrm/extdata/mix99/p1_example
is copied into a working directory. This is done with the function
prepare_example_p1()
which returns the working directory to
where the data files were copied
s_mix99_work_dir <- qzwslrm::prepare_example_p1()
list.files(s_mix99_work_dir)
Run MiX99
As shown in the documentation, an evaluation with MiX99 is run in two
steps. The first step runs mix99i_pro
with the CLIM file as
input. Then mix99s_pro
solves the system of equations. The
two commands are prepared and the program is run via the
system()
function.
s_clim_file <- "p1_data_001.clm"
s_mix99_cmd <- paste0("cd ", s_mix99_work_dir, " && mix99i_pro ", s_clim_file, " && mix99s_pro -s ")
vec_mix99_out <- system(command = s_mix99_cmd, intern = TRUE)
The output is captured in the character vector
vec_mix99_out
. This can be inspected by
and
Clean Up Work Directory
if (fs::dir_exists(path = s_mix99_work_dir)) fs::dir_delete(path = s_mix99_work_dir)
Data Preparation Using the Package qzwsmix99rt
The package qzwsmix99rt
can be used to convert QMSim
output to MiX99 input for a simple animal model with sex
as
a fixed effect. If required, the model complexity can be extended.
The first step is to check whether the package
qzwsmix99rt
is available.
# remove.packages("qzwsmix99rt")
if (!is.element("qzwsmix99rt", installed.packages()))
remotes::install_github(repo = "fbzwsqualitasag/qzwsmix99rt", dependencies = TRUE, upgrade = 'never')
#> Downloading GitHub repo fbzwsqualitasag/qzwsmix99rt@HEAD
#> * checking for file ‘/tmp/Rtmp3LJDCm/remotes596c04458a/fbzwsqualitasag-qzwsmix99rt-3bd9118/DESCRIPTION’ ... OK
#> * preparing ‘qzwsmix99rt’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> Omitted ‘LazyData’ from DESCRIPTION
#> * building ‘qzwsmix99rt_0.1.0.tar.gz’
#> Installing package into '/tmp/RtmpQ6Sl2w/temp_libpath1e0d2c67f2'
#> (as 'lib' is unspecified)
Conversion of QMSim Ouput to MiX99 Input
The QMSim output file is given by
(s_qm_out_path <- system.file("extdata", "qmsim", "p1_data_001.txt", package = "qzwslrm"))
#> [1] "/tmp/RtmpQ6Sl2w/temp_libpath1e0d2c67f2/qzwslrm/extdata/qmsim/p1_data_001.txt"
The conversion is done by
s_tmp_dir <- tempdir()
s_out_dir_whole <- file.path(s_tmp_dir, 'mix99_work_whole_data')
qzwsmix99rt::convert_qmsim_to_mix99(ps_qm_path = s_qm_out_path,
ps_out_dir = s_out_dir_whole,
pl_vcmp = list(genetic_variance = 0.1, residual_variance = 0.9))
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> Progeny = col_double(),
#> Sire = col_double(),
#> Dam = col_double(),
#> Sex = col_character(),
#> G = col_double(),
#> NMPrg = col_double(),
#> NFPrg = col_double(),
#> F = col_double(),
#> Homo = col_double(),
#> Phen = col_double(),
#> Res = col_double(),
#> Polygene = col_double(),
#> QTL = col_double(),
#> Final_EBV = col_double()
#> )
The output is checked with
list.files(path = s_out_dir_whole)
#> [1] "20220316092647_mix99.clm" "20220316092647_mix99.dat"
#> [3] "20220316092647_mix99.ped" "20220316092647_mix99.var"
Run MiX99
MiX99 was run on the created input files.
qzwsmix99rt::run_mix99(ps_work_dir = s_out_dir_whole)
Check the results
list.files(path = s_out_dir_whole)
#> [1] "20220316092647_mix99.clm" "20220316092647_mix99.dat"
#> [3] "20220316092647_mix99.ped" "20220316092647_mix99.var"
#> [5] "20220316092655_mix99.log" "ARlog"
#> [7] "Conlog" "Memlog"
#> [9] "MiX99_DIR.DIR" "MiX99_IN.DIR"
#> [11] "MiX99_IN.OPT" "MiX99.lst"
#> [13] "Modlog" "OK_mix99i"
#> [15] "OK_mix99s" "original_blockcode.dat"
#> [17] "Parlog" "Resid.List"
#> [19] "Solani" "Solfix"
#> [21] "Solvec" "Tm10.trco0"
#> [23] "Tmp0.para" "Tmp1.code"
#> [25] "Tmp4.pedi0" "Tmp5.clas0"
#> [27] "Tmp6.diab0" "Tmp9.strc"
#> [29] "Tralog"
The breeding values are stored in the result file “Solani”. This file can be read with
s_solani_path_whole <- file.path(s_out_dir_whole, "Solani")
vec_col_names <- c("animal", "nrprg", "trait", "ebv")
l_col_types <- readr::cols(animal = readr::col_integer(),
nprg = readr::col_integer(),
trait = readr::col_integer(),
ebv = readr::col_double())
tbl_solani_whole <- readr::read_table(file = s_solani_path_whole, col_names = vec_col_names, col_types = l_col_types)
#> Warning: The following named parsers don't match the column names: nprg
#> Warning: 3844 parsing failures.
#> row col expected actual file
#> 1 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#> 2 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#> 3 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#> 4 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#> 6 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#> ... ... ......... ......... ..............................................
#> See problems(...) for more details.
df_solani_whole <- read.table(file = s_solani_path_whole, sep = "", header = FALSE)
df_solani_whole[1:10,]
#> V1 V2 V3 V4
#> 1 1 10 1 -0.2142200
#> 2 2 30 1 0.1096000
#> 3 3 90 1 0.4845300
#> 4 4 10 1 -0.1543200
#> 5 5 20 1 0.0766110
#> 6 6 20 1 -0.1354800
#> 7 7 20 1 -0.0928440
#> 8 8 10 1 -0.0059609
#> 9 9 10 1 -0.0130210
#> 10 10 60 1 0.1806400
Comparing EBVs
summary(df_solani_whole[,4])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.3550 0.0847 0.2500 0.2556 0.4163 0.8965
sum(df_solani_whole[,4])
#> [1] 1276.597
The different properties of the columns of the ebv result file is
summary(tbl_solani_whole$ebv)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.3550 0.0847 0.2500 0.2556 0.4163 0.8965
Depending on the contrast chosen or the restrictions, the sum should add up to \(0\).
sum(tbl_solani_whole$ebv)
#> [1] 1276.597
Reading solani with fread()
dt_solani_whole <- data.table::fread(file = s_solani_path_whole, header = FALSE)
dt_solani_whole[1:10,]
#> V1 V2 V3 V4
#> 1: 1 10 1 -0.2142200
#> 2: 2 30 1 0.1096000
#> 3: 3 90 1 0.4845300
#> 4: 4 10 1 -0.1543200
#> 5: 5 20 1 0.0766110
#> 6: 6 20 1 -0.1354800
#> 7: 7 20 1 -0.0928440
#> 8: 8 10 1 -0.0059609
#> 9: 9 10 1 -0.0130210
#> 10: 10 60 1 0.1806400
summary(dt_solani_whole[,4])
#> V4
#> Min. :-0.3550
#> 1st Qu.: 0.0847
#> Median : 0.2500
#> Mean : 0.2556
#> 3rd Qu.: 0.4163
#> Max. : 0.8965
sum(dt_solani_whole[,4])
#> [1] 1276.597
Partial Data
In the partial data, we are setting all phenotypes from the last generation to missing and are predicting breeding values based on this reduced dataset. Based on the QMSim output, we have to determine which records have to be set to missing.
tbl_qm_partial <- qzwsmix99rt::read_qmsim(ps_path = s_qm_out_path)
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> Progeny = col_double(),
#> Sire = col_double(),
#> Dam = col_double(),
#> Sex = col_character(),
#> G = col_double(),
#> NMPrg = col_double(),
#> NFPrg = col_double(),
#> F = col_double(),
#> Homo = col_double(),
#> Phen = col_double(),
#> Res = col_double(),
#> Polygene = col_double(),
#> QTL = col_double(),
#> Final_EBV = col_double()
#> )
tbl_qm_partial
#> # A tibble: 4,995 × 14
#> Progeny Sire Dam Sex G NMPrg NFPrg F Homo Phen Res Polygene
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 M 0 4 6 0 0.701 NA 0 0
#> 2 2 0 0 M 0 14 16 0 0.703 NA 0 0
#> 3 3 0 0 M 0 35 55 0 0.693 NA 0 0
#> 4 4 0 0 M 0 5 5 0 0.710 NA 0 0
#> 5 5 0 0 M 0 12 8 0 0.694 NA 0 0
#> 6 6 0 0 M 0 7 13 0 0.694 NA 0 0
#> 7 7 0 0 M 0 13 7 0 0.704 NA 0 0
#> 8 8 0 0 M 0 3 7 0 0.689 NA 0 0
#> 9 9 0 0 M 0 5 5 0 0.699 NA 0 0
#> 10 10 0 0 M 0 19 41 0 0.697 NA 0 0
#> # … with 4,985 more rows, and 2 more variables: QTL <dbl>, Final_EBV <dbl>
The generation for each record is contained in
summary(tbl_qm_partial$G)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.000 2.000 5.000 4.955 8.000 10.000
All animals from the last generation are set to missing in the phenotype column
tbl_qm_partial[tbl_qm_partial$G == 10, ]
#> # A tibble: 450 × 14
#> Progeny Sire Dam Sex G NMPrg NFPrg F Homo Phen Res
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4546 1878 2400 F 10 0 0 0.00781 0.688 -0.165 -0.529
#> 2 4547 4452 1757 M 10 0 0 0 0.706 NA 0
#> 3 4548 2795 3242 M 10 0 0 0.0156 0.720 NA 0
#> 4 4549 3229 3911 F 10 0 0 0.0268 0.719 0.499 -0.576
#> 5 4550 4545 4265 F 10 0 0 0.0341 0.695 1.91 1.16
#> 6 4551 4497 2332 F 10 0 0 0.0200 0.717 0.300 -0.340
#> 7 4552 2762 3135 M 10 0 0 0.0127 0.699 NA 0
#> 8 4553 4330 312 F 10 0 0 0 0.699 -1.11 -1.68
#> 9 4554 3853 3864 M 10 0 0 0.0189 0.708 NA 0
#> 10 4555 3423 2754 M 10 0 0 0.0215 0.695 NA 0
#> # … with 440 more rows, and 3 more variables: Polygene <dbl>, QTL <dbl>,
#> # Final_EBV <dbl>
All phenotypes of these animals are set to missing.
tbl_qm_partial[tbl_qm_partial$G == 10, ]$Phen <- NA
tbl_qm_partial[tbl_qm_partial$G == 10, ]
#> # A tibble: 450 × 14
#> Progeny Sire Dam Sex G NMPrg NFPrg F Homo Phen Res
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4546 1878 2400 F 10 0 0 0.00781 0.688 NA -0.529
#> 2 4547 4452 1757 M 10 0 0 0 0.706 NA 0
#> 3 4548 2795 3242 M 10 0 0 0.0156 0.720 NA 0
#> 4 4549 3229 3911 F 10 0 0 0.0268 0.719 NA -0.576
#> 5 4550 4545 4265 F 10 0 0 0.0341 0.695 NA 1.16
#> 6 4551 4497 2332 F 10 0 0 0.0200 0.717 NA -0.340
#> 7 4552 2762 3135 M 10 0 0 0.0127 0.699 NA 0
#> 8 4553 4330 312 F 10 0 0 0 0.699 NA -1.68
#> 9 4554 3853 3864 M 10 0 0 0.0189 0.708 NA 0
#> 10 4555 3423 2754 M 10 0 0 0.0215 0.695 NA 0
#> # … with 440 more rows, and 3 more variables: Polygene <dbl>, QTL <dbl>,
#> # Final_EBV <dbl>
This partial dataset is used for a prediction of breeding values with partial data.
s_out_dir_partial <- file.path(s_tmp_dir, 'mix99_work_partial_data')
qzwsmix99rt::convert_qmsim_to_mix99(ptbl_qm = tbl_qm_partial,
ps_out_dir = s_out_dir_partial,
pl_vcmp = list(genetic_variance = 0.1, residual_variance = 0.9))
Checking the content of the working directory for the partial dataset
list.files(s_out_dir_partial)
#> [1] "20220316092656_mix99.clm" "20220316092656_mix99.dat"
#> [3] "20220316092656_mix99.ped" "20220316092656_mix99.var"
Breeding values are predicted using
qzwsmix99rt::run_mix99(ps_work_dir = s_out_dir_partial)
Checking the results
list.files(s_out_dir_partial)
#> [1] "20220316092656_mix99.clm" "20220316092656_mix99.dat"
#> [3] "20220316092656_mix99.ped" "20220316092656_mix99.var"
#> [5] "20220316092657_mix99.log" "ARlog"
#> [7] "Conlog" "Memlog"
#> [9] "MiX99_DIR.DIR" "MiX99_IN.DIR"
#> [11] "MiX99_IN.OPT" "MiX99.lst"
#> [13] "Modlog" "OK_mix99i"
#> [15] "OK_mix99s" "original_blockcode.dat"
#> [17] "Parlog" "Resid.List"
#> [19] "Solani" "Solfix"
#> [21] "Solvec" "Tm10.trco0"
#> [23] "Tmp0.para" "Tmp1.code"
#> [25] "Tmp4.pedi0" "Tmp5.clas0"
#> [27] "Tmp6.diab0" "Tmp9.strc"
#> [29] "Tralog"
s_solani_path_partial <- file.path(s_out_dir_partial, "Solani")
vec_col_names <- c("animal", "nrprg", "trait", "ebv")
l_col_types <- readr::cols(animal = readr::col_integer(),
nprg = readr::col_integer(),
trait = readr::col_integer(),
ebv = readr::col_double())
tbl_solani_partial <- readr::read_table(file = s_solani_path_partial, col_names = vec_col_names, col_types = l_col_types)
#> Warning: The following named parsers don't match the column names: nprg
#> Warning: 3839 parsing failures.
#> row col expected actual file
#> 1 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#> 2 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#> 3 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#> 4 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#> 6 -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#> ... ... ......... ......... ................................................
#> See problems(...) for more details.
The same type of statistics with the ebv from the partial data
summary(tbl_solani_partial$ebv)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.35504 0.08577 0.24813 0.24947 0.42129 0.74823
Statistics based on the LR-Methods
As a first preparatory, step, we extract the vector of ebv from the tibbles
vec_ebv_p <- tbl_solani_partial$ebv
vec_ebv_w <- tbl_solani_whole$ebv
length(vec_ebv_p)
#> [1] 4995
length(vec_ebv_w)
#> [1] 4995
The following statistics are computed
- bias: \(\overline{\hat{u}_p} - \overline{\hat{u}_w}\)
- regression of whole on partial: \(b_{w,p} = \frac{cov(\hat{u}_w,\hat{u}_p)}{var(\hat{u}_p)}\)
- correlation between whole and partial: \(r_{w,p} = \frac{cov(\hat{u}_w,\hat{u}_p)}{\sqrt{var(\hat{u}_p)var(\hat{u}_w)}}\)
(n_cor_wp <- cor(vec_ebv_p, vec_ebv_w))
#> [1] 0.987303
- regression of partial on whole: \(b_{p,w} = \frac{cov(\hat{u}_w,\hat{u}_p)}{var(\hat{u}_w)}\)
Clean Up Work Directory
if (fs::dir_exists(path = s_tmp_dir)) fs::dir_delete(path = s_tmp_dir)