Prediction of Breeding Values Using MiX99

Disclaimer

The required steps to run the prediction of breeding values is documented.

MiX99

Available versions of MiX99 are linked into the directory /qualstorzws01/data_projekte/linuxBin. The latest version are available with the suffix pro attached to each of the commands. From the examples directory, the way how different analyses are run can be seen. The basic commands are

mix99i_pro AM.clm  > mix99i.log
mix99s_pro -s      > mix99s.log

where the required parameters are specified in the so-called CLIM file AM.clm.

Example Data

As described in a separate vignette on Generating Test Data an example dataset is simulated using QMSim. The resulting simulation output is prepared to be analysed by MiX99. In order to have a simple pipeline for the analysis, the input data is copied into a working directory and from there the analysis is started.

Prepare MiX99 Input Files

This section shows the manual preparation of the MiX99 input files. This can easier be done by using the package qzwsmix99rt.

s_mix99_dir <- system.file("extdata", "mix99", "p1_example", package = "qzwslrm")
list.files(path = s_mix99_dir)
#> [1] "p1_data_001.clm" "p1_data_001.dat" "p1_data_001.ped" "p1_data_001.var"

The content of the directory /tmp/RtmpQ6Sl2w/temp_libpath1e0d2c67f2/qzwslrm/extdata/mix99/p1_example is copied into a working directory. This is done with the function prepare_example_p1() which returns the working directory to where the data files were copied

s_mix99_work_dir <- qzwslrm::prepare_example_p1()
list.files(s_mix99_work_dir)

Run MiX99

As shown in the documentation, an evaluation with MiX99 is run in two steps. The first step runs mix99i_pro with the CLIM file as input. Then mix99s_pro solves the system of equations. The two commands are prepared and the program is run via the system() function.

s_clim_file <- "p1_data_001.clm"
s_mix99_cmd <- paste0("cd ", s_mix99_work_dir, " && mix99i_pro ", s_clim_file, " && mix99s_pro -s ")
vec_mix99_out <- system(command = s_mix99_cmd, intern = TRUE)

The output is captured in the character vector vec_mix99_out. This can be inspected by

cat(paste0(head(vec_mix99_out), collapse = "\n"), "\n")

and

cat(paste0(tail(vec_mix99_out), collapse = "\n"), "\n")

Clean Up Work Directory

if (fs::dir_exists(path = s_mix99_work_dir)) fs::dir_delete(path = s_mix99_work_dir)

Data Preparation Using the Package `qzwsmix99rt`

The package qzwsmix99rt can be used to convert QMSim output to MiX99 input for a simple animal model with sex as a fixed effect. If required, the model complexity can be extended.

The first step is to check whether the package qzwsmix99rt is available.

# remove.packages("qzwsmix99rt")
if (!is.element("qzwsmix99rt", installed.packages())) 
  remotes::install_github(repo = "fbzwsqualitasag/qzwsmix99rt", dependencies = TRUE, upgrade = 'never')
#> Downloading GitHub repo fbzwsqualitasag/qzwsmix99rt@HEAD
#> * checking for file ‘/tmp/Rtmp3LJDCm/remotes596c04458a/fbzwsqualitasag-qzwsmix99rt-3bd9118/DESCRIPTION’ ... OK
#> * preparing ‘qzwsmix99rt’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> Omitted ‘LazyData’ from DESCRIPTION
#> * building ‘qzwsmix99rt_0.1.0.tar.gz’
#> Installing package into '/tmp/RtmpQ6Sl2w/temp_libpath1e0d2c67f2'
#> (as 'lib' is unspecified)

Conversion of QMSim Ouput to MiX99 Input

The QMSim output file is given by

(s_qm_out_path <- system.file("extdata", "qmsim", "p1_data_001.txt", package = "qzwslrm"))
#> [1] "/tmp/RtmpQ6Sl2w/temp_libpath1e0d2c67f2/qzwslrm/extdata/qmsim/p1_data_001.txt"

The conversion is done by

s_tmp_dir <- tempdir()
s_out_dir_whole <- file.path(s_tmp_dir, 'mix99_work_whole_data')
qzwsmix99rt::convert_qmsim_to_mix99(ps_qm_path = s_qm_out_path,
                                    ps_out_dir = s_out_dir_whole,
                                    pl_vcmp      = list(genetic_variance = 0.1, residual_variance = 0.9))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Progeny = col_double(),
#>   Sire = col_double(),
#>   Dam = col_double(),
#>   Sex = col_character(),
#>   G = col_double(),
#>   NMPrg = col_double(),
#>   NFPrg = col_double(),
#>   F = col_double(),
#>   Homo = col_double(),
#>   Phen = col_double(),
#>   Res = col_double(),
#>   Polygene = col_double(),
#>   QTL = col_double(),
#>   Final_EBV = col_double()
#> )

The output is checked with

list.files(path = s_out_dir_whole)
#> [1] "20220316092647_mix99.clm" "20220316092647_mix99.dat"
#> [3] "20220316092647_mix99.ped" "20220316092647_mix99.var"

Run MiX99

MiX99 was run on the created input files.

qzwsmix99rt::run_mix99(ps_work_dir = s_out_dir_whole)

Check the results

list.files(path = s_out_dir_whole)
#>  [1] "20220316092647_mix99.clm" "20220316092647_mix99.dat"
#>  [3] "20220316092647_mix99.ped" "20220316092647_mix99.var"
#>  [5] "20220316092655_mix99.log" "ARlog"                   
#>  [7] "Conlog"                   "Memlog"                  
#>  [9] "MiX99_DIR.DIR"            "MiX99_IN.DIR"            
#> [11] "MiX99_IN.OPT"             "MiX99.lst"               
#> [13] "Modlog"                   "OK_mix99i"               
#> [15] "OK_mix99s"                "original_blockcode.dat"  
#> [17] "Parlog"                   "Resid.List"              
#> [19] "Solani"                   "Solfix"                  
#> [21] "Solvec"                   "Tm10.trco0"              
#> [23] "Tmp0.para"                "Tmp1.code"               
#> [25] "Tmp4.pedi0"               "Tmp5.clas0"              
#> [27] "Tmp6.diab0"               "Tmp9.strc"               
#> [29] "Tralog"

The breeding values are stored in the result file “Solani”. This file can be read with

s_solani_path_whole <- file.path(s_out_dir_whole, "Solani")
vec_col_names <- c("animal", "nrprg", "trait", "ebv")
l_col_types <- readr::cols(animal = readr::col_integer(),
                           nprg = readr::col_integer(),
                           trait = readr::col_integer(),
                           ebv = readr::col_double())
tbl_solani_whole <- readr::read_table(file = s_solani_path_whole, col_names = vec_col_names, col_types = l_col_types)
#> Warning: The following named parsers don't match the column names: nprg
#> Warning: 3844 parsing failures.
#> row col  expected    actual                                           file
#>   1  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#>   2  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#>   3  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#>   4  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#>   6  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_whole_data/Solani'
#> ... ... ......... ......... ..............................................
#> See problems(...) for more details.

df_solani_whole <- read.table(file = s_solani_path_whole, sep = "", header = FALSE)
df_solani_whole[1:10,]
#>    V1 V2 V3         V4
#> 1   1 10  1 -0.2142200
#> 2   2 30  1  0.1096000
#> 3   3 90  1  0.4845300
#> 4   4 10  1 -0.1543200
#> 5   5 20  1  0.0766110
#> 6   6 20  1 -0.1354800
#> 7   7 20  1 -0.0928440
#> 8   8 10  1 -0.0059609
#> 9   9 10  1 -0.0130210
#> 10 10 60  1  0.1806400

Comparing EBVs

summary(df_solani_whole[,4])
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> -0.3550  0.0847  0.2500  0.2556  0.4163  0.8965

sum(df_solani_whole[,4])
#> [1] 1276.597

The different properties of the columns of the ebv result file is

summary(tbl_solani_whole$ebv)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> -0.3550  0.0847  0.2500  0.2556  0.4163  0.8965

Depending on the contrast chosen or the restrictions, the sum should add up to \(0\).

sum(tbl_solani_whole$ebv)
#> [1] 1276.597

Reading solani with fread()

dt_solani_whole <- data.table::fread(file = s_solani_path_whole, header = FALSE)
dt_solani_whole[1:10,]
#>     V1 V2 V3         V4
#>  1:  1 10  1 -0.2142200
#>  2:  2 30  1  0.1096000
#>  3:  3 90  1  0.4845300
#>  4:  4 10  1 -0.1543200
#>  5:  5 20  1  0.0766110
#>  6:  6 20  1 -0.1354800
#>  7:  7 20  1 -0.0928440
#>  8:  8 10  1 -0.0059609
#>  9:  9 10  1 -0.0130210
#> 10: 10 60  1  0.1806400

summary(dt_solani_whole[,4])
#>        V4         
#>  Min.   :-0.3550  
#>  1st Qu.: 0.0847  
#>  Median : 0.2500  
#>  Mean   : 0.2556  
#>  3rd Qu.: 0.4163  
#>  Max.   : 0.8965

sum(dt_solani_whole[,4])
#> [1] 1276.597

Partial Data

In the partial data, we are setting all phenotypes from the last generation to missing and are predicting breeding values based on this reduced dataset. Based on the QMSim output, we have to determine which records have to be set to missing.

tbl_qm_partial <- qzwsmix99rt::read_qmsim(ps_path = s_qm_out_path)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Progeny = col_double(),
#>   Sire = col_double(),
#>   Dam = col_double(),
#>   Sex = col_character(),
#>   G = col_double(),
#>   NMPrg = col_double(),
#>   NFPrg = col_double(),
#>   F = col_double(),
#>   Homo = col_double(),
#>   Phen = col_double(),
#>   Res = col_double(),
#>   Polygene = col_double(),
#>   QTL = col_double(),
#>   Final_EBV = col_double()
#> )
tbl_qm_partial
#> # A tibble: 4,995 × 14
#>    Progeny  Sire   Dam Sex       G NMPrg NFPrg     F  Homo  Phen   Res Polygene
#>      <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
#>  1       1     0     0 M         0     4     6     0 0.701    NA     0        0
#>  2       2     0     0 M         0    14    16     0 0.703    NA     0        0
#>  3       3     0     0 M         0    35    55     0 0.693    NA     0        0
#>  4       4     0     0 M         0     5     5     0 0.710    NA     0        0
#>  5       5     0     0 M         0    12     8     0 0.694    NA     0        0
#>  6       6     0     0 M         0     7    13     0 0.694    NA     0        0
#>  7       7     0     0 M         0    13     7     0 0.704    NA     0        0
#>  8       8     0     0 M         0     3     7     0 0.689    NA     0        0
#>  9       9     0     0 M         0     5     5     0 0.699    NA     0        0
#> 10      10     0     0 M         0    19    41     0 0.697    NA     0        0
#> # … with 4,985 more rows, and 2 more variables: QTL <dbl>, Final_EBV <dbl>

The generation for each record is contained in

summary(tbl_qm_partial$G)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.000   2.000   5.000   4.955   8.000  10.000

All animals from the last generation are set to missing in the phenotype column

tbl_qm_partial[tbl_qm_partial$G == 10, ]
#> # A tibble: 450 × 14
#>    Progeny  Sire   Dam Sex       G NMPrg NFPrg       F  Homo   Phen    Res
#>      <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>   <dbl> <dbl>  <dbl>  <dbl>
#>  1    4546  1878  2400 F        10     0     0 0.00781 0.688 -0.165 -0.529
#>  2    4547  4452  1757 M        10     0     0 0       0.706 NA      0    
#>  3    4548  2795  3242 M        10     0     0 0.0156  0.720 NA      0    
#>  4    4549  3229  3911 F        10     0     0 0.0268  0.719  0.499 -0.576
#>  5    4550  4545  4265 F        10     0     0 0.0341  0.695  1.91   1.16 
#>  6    4551  4497  2332 F        10     0     0 0.0200  0.717  0.300 -0.340
#>  7    4552  2762  3135 M        10     0     0 0.0127  0.699 NA      0    
#>  8    4553  4330   312 F        10     0     0 0       0.699 -1.11  -1.68 
#>  9    4554  3853  3864 M        10     0     0 0.0189  0.708 NA      0    
#> 10    4555  3423  2754 M        10     0     0 0.0215  0.695 NA      0    
#> # … with 440 more rows, and 3 more variables: Polygene <dbl>, QTL <dbl>,
#> #   Final_EBV <dbl>

All phenotypes of these animals are set to missing.

tbl_qm_partial[tbl_qm_partial$G == 10, ]$Phen <- NA
tbl_qm_partial[tbl_qm_partial$G == 10, ]
#> # A tibble: 450 × 14
#>    Progeny  Sire   Dam Sex       G NMPrg NFPrg       F  Homo  Phen    Res
#>      <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>  <dbl>
#>  1    4546  1878  2400 F        10     0     0 0.00781 0.688    NA -0.529
#>  2    4547  4452  1757 M        10     0     0 0       0.706    NA  0    
#>  3    4548  2795  3242 M        10     0     0 0.0156  0.720    NA  0    
#>  4    4549  3229  3911 F        10     0     0 0.0268  0.719    NA -0.576
#>  5    4550  4545  4265 F        10     0     0 0.0341  0.695    NA  1.16 
#>  6    4551  4497  2332 F        10     0     0 0.0200  0.717    NA -0.340
#>  7    4552  2762  3135 M        10     0     0 0.0127  0.699    NA  0    
#>  8    4553  4330   312 F        10     0     0 0       0.699    NA -1.68 
#>  9    4554  3853  3864 M        10     0     0 0.0189  0.708    NA  0    
#> 10    4555  3423  2754 M        10     0     0 0.0215  0.695    NA  0    
#> # … with 440 more rows, and 3 more variables: Polygene <dbl>, QTL <dbl>,
#> #   Final_EBV <dbl>

This partial dataset is used for a prediction of breeding values with partial data.

s_out_dir_partial <- file.path(s_tmp_dir, 'mix99_work_partial_data')
qzwsmix99rt::convert_qmsim_to_mix99(ptbl_qm    = tbl_qm_partial,
                                    ps_out_dir = s_out_dir_partial,
                                    pl_vcmp    = list(genetic_variance = 0.1, residual_variance = 0.9))

Checking the content of the working directory for the partial dataset

list.files(s_out_dir_partial)
#> [1] "20220316092656_mix99.clm" "20220316092656_mix99.dat"
#> [3] "20220316092656_mix99.ped" "20220316092656_mix99.var"

Breeding values are predicted using

qzwsmix99rt::run_mix99(ps_work_dir = s_out_dir_partial)

Checking the results

list.files(s_out_dir_partial)
#>  [1] "20220316092656_mix99.clm" "20220316092656_mix99.dat"
#>  [3] "20220316092656_mix99.ped" "20220316092656_mix99.var"
#>  [5] "20220316092657_mix99.log" "ARlog"                   
#>  [7] "Conlog"                   "Memlog"                  
#>  [9] "MiX99_DIR.DIR"            "MiX99_IN.DIR"            
#> [11] "MiX99_IN.OPT"             "MiX99.lst"               
#> [13] "Modlog"                   "OK_mix99i"               
#> [15] "OK_mix99s"                "original_blockcode.dat"  
#> [17] "Parlog"                   "Resid.List"              
#> [19] "Solani"                   "Solfix"                  
#> [21] "Solvec"                   "Tm10.trco0"              
#> [23] "Tmp0.para"                "Tmp1.code"               
#> [25] "Tmp4.pedi0"               "Tmp5.clas0"              
#> [27] "Tmp6.diab0"               "Tmp9.strc"               
#> [29] "Tralog"

s_solani_path_partial <- file.path(s_out_dir_partial, "Solani")
vec_col_names <- c("animal", "nrprg", "trait", "ebv")
l_col_types <- readr::cols(animal = readr::col_integer(),
                           nprg = readr::col_integer(),
                           trait = readr::col_integer(),
                           ebv = readr::col_double())
tbl_solani_partial <- readr::read_table(file = s_solani_path_partial, col_names = vec_col_names, col_types = l_col_types)
#> Warning: The following named parsers don't match the column names: nprg
#> Warning: 3839 parsing failures.
#> row col  expected    actual                                             file
#>   1  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#>   2  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#>   3  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#>   4  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#>   6  -- 4 columns 5 columns '/tmp/Rtmp3LJDCm/mix99_work_partial_data/Solani'
#> ... ... ......... ......... ................................................
#> See problems(...) for more details.

The same type of statistics with the ebv from the partial data

summary(tbl_solani_partial$ebv)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -0.35504  0.08577  0.24813  0.24947  0.42129  0.74823

Statistics based on the LR-Methods

As a first preparatory, step, we extract the vector of ebv from the tibbles

vec_ebv_p <- tbl_solani_partial$ebv
vec_ebv_w <- tbl_solani_whole$ebv
length(vec_ebv_p)
#> [1] 4995
length(vec_ebv_w)
#> [1] 4995

The following statistics are computed

bias: \(\overline{\hat{u}_p} - \overline{\hat{u}_w}\)

(n_bias <- mean(vec_ebv_p) - mean(vec_ebv_w))
#> [1] -0.006104794

regression of whole on partial: \(b_{w,p} = \frac{cov(\hat{u}_w,\hat{u}_p)}{var(\hat{u}_p)}\)

(n_reg_wp <- cov(vec_ebv_p, vec_ebv_w) / var(vec_ebv_p))
#> [1] 1.037712

correlation between whole and partial: \(r_{w,p} = \frac{cov(\hat{u}_w,\hat{u}_p)}{\sqrt{var(\hat{u}_p)var(\hat{u}_w)}}\)

(n_cor_wp <- cor(vec_ebv_p, vec_ebv_w))
#> [1] 0.987303

regression of partial on whole: \(b_{p,w} = \frac{cov(\hat{u}_w,\hat{u}_p)}{var(\hat{u}_w)}\)

(n_reg_pw <- cov(vec_ebv_p, vec_ebv_w) / var(vec_ebv_w))
#> [1] 0.9393427

Clean Up Work Directory

if (fs::dir_exists(path = s_tmp_dir)) fs::dir_delete(path = s_tmp_dir)

Disclaimer

MiX99

Example Data

Prepare MiX99 Input Files

Run MiX99

Clean Up Work Directory

Data Preparation Using the Package qzwsmix99rt

Conversion of QMSim Ouput to MiX99 Input

Run MiX99

Partial Data

Statistics based on the LR-Methods

Clean Up Work Directory

Data Preparation Using the Package `qzwsmix99rt`