Tutorial: Single chromosome optimization using OpenMiChroM¶

This tutorial enables performing optimization in MiChroM Parameters (Second-order optimization -> Hessian inversion)¶

The first step is import the OpenMiChroM modules.

To install OpenMM and OpenMiChroM, follow the instalation guide

The inputs and apps used in this tutorial can be downloaded here

Types optimization is available in OpenMichroM version 1.0.5

[39]:

from OpenMiChroM.ChromDynamics import MiChroM #OpenMiChroM simulation module
from OpenMiChroM.Optimization import CustomMiChroMTraining #optimization MiChroM parameters module
from OpenMiChroM.CndbTools import cndbTools #analysis tools module

#modules to load and plot .dense file
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn.preprocessing import normalize
import numpy as np
import pandas as pd
import h5py

The second step is to have a look on the experimental Hi-C

A Hi-C file is required for the analysis and training of the MiChroM Potentials (Types and Ideal Chromosome). The file format chosen here is a matrix .txt file (we call it the dense file).

For this tutorial, we will use chromosome 10 from GM12878 in 100 kb resolution.

To extract it from the .hic file we can use juicer_tools with this command:

java -jar juicer_tools_1.22.01.jar dump observed NONE -d https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic 10 10 BP 100000 input/chr10_100k.dense

[2]:

%%bash
java -jar apps/juicer_tools_1.22.01.jar dump observed Balanced -d https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic 10 10 BP 100000 input/chr10_100k.dense

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2022-12-07T11:19:30,430]  [Globals.java:138] [main]  Development mode is enabled
INFO [2022-12-07T11:19:34,057]  [DirectoryManager.java:179] [main]  IGV Directory: /home/antonio/igv
INFO [2022-12-07T11:19:35,068]  [HttpUtils.java:937] [main]  Range-byte request succeeded

This command downloads the .hic from the web and extracts the chromosome 10 in .dense format to the folder “input”.

You can get more information about it at the JuicerTools documentation.

Visualize the .dense file for inspection

[3]:

filename = 'input/chr10_100k.dense'
hic_file = np.loadtxt(filename)

r=np.triu(hic_file, k=1)
r[np.isnan(r)]= 0.0
r = normalize(r, axis=1, norm='max')
rd = np.transpose(r)
r=r+rd + np.diag(np.ones(len(r)))
print("number of beads: ", len(r))
plt.matshow(r,norm=mpl.colors.LogNorm(vmin=0.0001, vmax=r.max()),cmap="Reds")
plt.colorbar()

number of beads:  1356

[3]:

<matplotlib.colorbar.Colorbar at 0x7fea4445bfd0>

../_images/Tutorials_Tutorial_MiChroM_Optimization_8_2.png

The Hi-C map has resolution of \(100 kb\) per bead, so the chromosome 10 model has a polymer chain with a total of 1356 beads

The next step is to extract the sequence file (A/B sequence) by using the eigenvector decomposition.

Using the juicertools you can extract the eigenvector file. The eigenvector has values both negatives and positives and here we will arbitrary set positives as A1 and negatives as B1.

For more details about how it works, take a look on this paper: https://pubs.acs.org/doi/full/10.1021/acs.jpcb.1c04174

[4]:

%%bash
java -jar apps/juicer_tools_1.22.01.jar eigenvector -p Balanced https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic 10 BP 100000 input/chr10_100k.eigen

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2022-12-07T11:23:03,393]  [Globals.java:138] [main]  Development mode is enabled
INFO [2022-12-07T11:23:05,061]  [DirectoryManager.java:179] [main]  IGV Directory: /home/antonio/igv
INFO [2022-12-07T11:23:06,008]  [HttpUtils.java:937] [main]  Range-byte request succeeded

[6]:

eigen = np.loadtxt("input/chr10_100k.eigen")
plt.plot(eigen)

[6]:

[<matplotlib.lines.Line2D at 0x7fe99b2d5090>]

../_images/Tutorials_Tutorial_MiChroM_Optimization_12_1.png

From the .eigen file we can create the A/B sequence file.

[7]:

%%bash
awk '{if ($1 < 0) print "B1"; else print "A1"}' input/chr10_100k.eigen | cat -n > input/seq_chr10_100k.txt
head input/seq_chr10_100k.txt

The pipeline to perform the Types Potential Optimization is:

1 - Run a long simulation using the homopolymer potential + customTypes potential. The first iteration will start with all parameters equal to zero or a initial guess.

2 - Get frames from this simualtion to perform the inversion for Types.

3 - In the end of inversion, new values to types interactions will be produced.

4 - Calcule the error between the simulated and experimental parameters. If the error is above the treshold, re-do steps 1-3 until reaching the treshold (usually 10% or 15%).

The Types file is a .txt file with a matrix labeled with values for each interaction. In this tutorial, we will be training A1 and B1 type.

Lets create the initial file at this format: A1,B1 0,0 0,0

For this matrix, we have AA AB BA BB interactions.

Save it as lambda_0.txt

[8]:

%%bash
echo "A1,B1
0,0
0,0" > input/lambda_0

cat input/lambda_0

A1,B1
0,0
0,0

With all the required inputs, lets perform a simulation for iteration 0

In MiChroM initiation there are some variables to setup:

time_step=0.01 (the time step using for integration, default is 0.01) temperature=1 (Set the temperature of your simulation) name=’opt_chr10_100K’ (the simulation name)

[66]:

sim = MiChroM(name='opt_chr10_100K',temperature=1.0, time_step=0.01)

    ***************************************************************************************
     **** **** *** *** *** *** *** *** OpenMiChroM-1.0.5 *** *** *** *** *** *** **** ****

         OpenMiChroM is a Python library for performing chromatin dynamics simulations.
                            OpenMiChroM uses the OpenMM Python API,
                employing the MiChroM (Minimal Chromatin Model) energy function.
      The chromatin dynamics simulations generate an ensemble of 3D chromosomal structures
      that are consistent with experimental Hi-C maps, also allows simulations of a single
                 or multiple chromosome chain using High-Performance Computing
                            in different platforms (GPUs and CPUs).
         OpenMiChroM documentation is available at https://open-michrom.readthedocs.io

         OpenMiChroM is described in: Oliveira Junior, A. B & Contessoto, V, G et. al.
      A Scalable Computational Approach for Simulating Complexes of Multiple Chromosomes.
                  Journal of Molecular Biology. doi:10.1016/j.jmb.2020.10.034.
                                              and
                                 Oliveira Junior, A. B. et al.
     Chromosome Modeling on Downsampled Hi-C Maps Enhances the Compartmentalization Signal.
                        J. Phys. Chem. B, doi:10.1021/acs.jpcb.1c04174.

                    Copyright (c) 2022, The OpenMiChroM development team at
                                        Rice University
    ***************************************************************************************

Now you need to setup the platform that you will use, the options are:

platform=”cuda” (remember that you need to install CUDA in your system) GPU=”0” (optional) (if you have more than one GPU device, you can set which gpu you want [“0”, “1”,…,”n”]) platform=”cpu” platform=”opencl”

[67]:

sim.setup(platform="CUDA")

Set the folder name where the output will be saved

[68]:

sim.saveFolder('iteration_0')

The next step is to setup your chromosome sequence and initial configuration

[69]:

mychro = sim.createSpringSpiral(ChromSeq="input/seq_chr10_100k.txt")

Load the initial structure into “sim” object

[70]:

sim.loadStructure(mychro, center=True)

Now it is time to include the force field in the simulation object “sim”

Lets separate forces in two sets:

Homopolymer Potentials

[71]:

sim.addFENEBonds(kfb=30.0)
sim.addAngles(ka=2.0)
sim.addRepulsiveSoftCore(Ecut=4.0)

Chromosome Potentials

In this tutorial, it is used the CustomTypes potential. Here we need to pass a file that contains a matrix of interactions for each other different type of chromosome. To check that, you can look on the documentation https://open-michrom.readthedocs.io/en/latest/OpenMiChroM.html#OpenMiChroM.ChromDynamics.MiChroM.addCustomTypes

[72]:

sim.addCustomTypes(mu=3.22, rc = 1.78, TypesTable='input/lambda_0')

Note: these valeus for mu and rc were calculated for human GM12878 cells and can be changed for other species.

The last potential to be added is the spherical restraint in order to collapse the initial structure

[73]:

sim.addFlatBottomHarmonic(kr=5*10**-3, n_rad=8.0)

Now we will run a short simulation in order to get a collapsed structure.

There are two variables that control the chromosomes simulation steps:

block: The number of steps performed in each cycle (n_Blocks) n_blocks: The number of blocks that will be perfomed.

In this example, to perfom the collapsing we will run \(5\times10^2 \times 10^3 = 5\times10^5\) steps

[74]:

block = 5*10**2
n_blocks = 10**3

We can save the radius of gyration of each block to observe the convergence into the collapsed state (the time required here depends on the size of your chromosome)

[75]:

rg = []

[76]:

for _ in range(n_blocks):
    sim.runSimBlock(block, increment=False)
    rg.append(sim.chromRG())

#save a collapsed structure in pdb format for inspection
sim.saveStructure(mode = 'pdb')

Number of exceptions: 1355
adding force  FENEBond 0
adding force  AngleForce 1
Add exclusions for RepulsiveSoftCore force
adding force  RepulsiveSoftCore 2
Add exclusions for CustomTypes force
adding force  CustomTypes 3
adding force  FlatBottomHarmonic 4
Positions...
 loaded!
potential energy is 31.227323
bl=0 pos[1]=[105.7 -3.8 0.6] dr=1.24 t=0.0ps kin=1.55 pot=31.48 Rg=72.954 SPS=16053
bl=0 pos[1]=[104.7 -3.3 2.7] dr=1.95 t=0.0ps kin=1.67 pot=31.18 Rg=71.676 SPS=17587
bl=0 pos[1]=[102.4 -4.1 4.1] dr=2.00 t=0.0ps kin=1.74 pot=30.82 Rg=70.219 SPS=17438
bl=0 pos[1]=[99.7 -6.1 5.9] dr=1.95 t=0.0ps kin=1.76 pot=30.50 Rg=68.733 SPS=17462
bl=0 pos[1]=[96.7 -5.9 6.4] dr=1.93 t=0.0ps kin=1.89 pot=30.09 Rg=67.232 SPS=17478
bl=0 pos[1]=[93.7 -5.8 7.8] dr=1.98 t=0.0ps kin=1.89 pot=29.68 Rg=65.681 SPS=17308
bl=0 pos[1]=[90.9 -5.0 7.4] dr=1.96 t=0.0ps kin=1.93 pot=29.34 Rg=64.178 SPS=20073
bl=0 pos[1]=[90.9 -5.7 6.8] dr=1.93 t=0.0ps kin=1.90 pot=28.93 Rg=62.698 SPS=20742
bl=0 pos[1]=[90.4 -5.2 7.1] dr=1.88 t=0.0ps kin=1.88 pot=28.57 Rg=61.274 SPS=21052
bl=0 pos[1]=[88.3 -5.7 5.6] dr=1.85 t=0.0ps kin=1.90 pot=28.18 Rg=59.906 SPS=21011
bl=0 pos[1]=[86.5 -6.3 4.5] dr=1.85 t=0.0ps kin=1.86 pot=27.88 Rg=58.504 SPS=20696
bl=0 pos[1]=[85.7 -6.0 4.8] dr=1.86 t=0.0ps kin=1.91 pot=27.55 Rg=57.112 SPS=20802
bl=0 pos[1]=[84.9 -5.2 5.0] dr=1.83 t=0.0ps kin=1.87 pot=27.24 Rg=55.765 SPS=20556
bl=0 pos[1]=[83.2 -3.1 4.6] dr=1.83 t=0.0ps kin=1.85 pot=26.95 Rg=54.445 SPS=20424
bl=0 pos[1]=[80.9 -1.8 4.1] dr=1.80 t=0.0ps kin=1.83 pot=26.61 Rg=53.144 SPS=20322
bl=0 pos[1]=[77.8 -0.5 4.9] dr=1.77 t=0.0ps kin=1.78 pot=26.34 Rg=51.895 SPS=20342
bl=0 pos[1]=[75.9 -0.1 5.4] dr=1.71 t=0.0ps kin=1.76 pot=26.12 Rg=50.727 SPS=19943
bl=0 pos[1]=[74.8 -0.5 6.2] dr=1.71 t=0.0ps kin=1.72 pot=25.86 Rg=49.591 SPS=20255
bl=0 pos[1]=[73.2 -3.1 6.0] dr=1.64 t=0.0ps kin=1.68 pot=25.67 Rg=48.524 SPS=20553
bl=0 pos[1]=[69.8 -3.3 4.6] dr=1.59 t=0.0ps kin=1.69 pot=25.46 Rg=47.505 SPS=20516
bl=0 pos[1]=[67.7 -1.6 2.8] dr=1.58 t=0.0ps kin=1.69 pot=25.25 Rg=46.517 SPS=20445
bl=0 pos[1]=[66.4 -1.7 3.2] dr=1.58 t=0.0ps kin=1.68 pot=25.08 Rg=45.541 SPS=20674
bl=0 pos[1]=[65.8 -1.9 2.3] dr=1.58 t=0.0ps kin=1.74 pot=24.88 Rg=44.574 SPS=20586
bl=0 pos[1]=[64.6 -3.0 2.1] dr=1.63 t=0.0ps kin=1.75 pot=24.74 Rg=43.588 SPS=20882
bl=0 pos[1]=[62.6 -1.6 2.6] dr=1.57 t=0.0ps kin=1.68 pot=24.58 Rg=42.626 SPS=20777
bl=0 pos[1]=[59.4 -0.7 1.9] dr=1.54 t=0.0ps kin=1.71 pot=24.43 Rg=41.684 SPS=21043
bl=0 pos[1]=[56.3 -0.1 2.6] dr=1.59 t=0.0ps kin=1.67 pot=24.23 Rg=40.783 SPS=20794
bl=0 pos[1]=[56.4 1.3 2.5] dr=1.51 t=0.0ps kin=1.66 pot=24.13 Rg=39.934 SPS=20599
bl=0 pos[1]=[56.5 0.9 2.5] dr=1.50 t=0.0ps kin=1.68 pot=23.95 Rg=39.109 SPS=20443
bl=0 pos[1]=[54.5 1.3 1.8] dr=1.50 t=0.0ps kin=1.60 pot=23.83 Rg=38.300 SPS=20511
bl=0 pos[1]=[54.2 2.8 1.1] dr=1.47 t=0.0ps kin=1.55 pot=23.75 Rg=37.514 SPS=20346
bl=0 pos[1]=[51.4 3.3 0.5] dr=1.48 t=0.0ps kin=1.61 pot=23.61 Rg=36.738 SPS=20361
bl=0 pos[1]=[49.0 4.5 0.9] dr=1.46 t=0.0ps kin=1.61 pot=23.49 Rg=36.006 SPS=20278
bl=0 pos[1]=[48.1 5.6 2.6] dr=1.48 t=0.0ps kin=1.64 pot=23.39 Rg=35.257 SPS=21124
bl=0 pos[1]=[47.2 6.5 1.9] dr=1.47 t=0.0ps kin=1.60 pot=23.33 Rg=34.527 SPS=21190
bl=0 pos[1]=[47.2 7.4 3.3] dr=1.48 t=0.0ps kin=1.61 pot=23.21 Rg=33.826 SPS=21369
bl=0 pos[1]=[46.1 6.1 3.3] dr=1.45 t=0.0ps kin=1.61 pot=23.14 Rg=33.181 SPS=21381
bl=0 pos[1]=[44.4 5.7 2.2] dr=1.41 t=0.0ps kin=1.55 pot=23.09 Rg=32.575 SPS=20543
bl=0 pos[1]=[43.5 5.5 0.8] dr=1.42 t=0.0ps kin=1.57 pot=23.02 Rg=31.994 SPS=21699
bl=0 pos[1]=[43.3 4.5 1.3] dr=1.41 t=0.0ps kin=1.53 pot=22.91 Rg=31.410 SPS=22004
bl=0 pos[1]=[43.0 5.8 1.0] dr=1.39 t=0.0ps kin=1.53 pot=22.88 Rg=30.837 SPS=21872
bl=0 pos[1]=[41.0 5.3 1.0] dr=1.37 t=0.0ps kin=1.58 pot=22.84 Rg=30.284 SPS=21754
bl=0 pos[1]=[41.1 6.2 0.2] dr=1.39 t=0.0ps kin=1.58 pot=22.77 Rg=29.678 SPS=21526
bl=0 pos[1]=[38.8 7.6 1.4] dr=1.39 t=0.0ps kin=1.57 pot=22.67 Rg=29.088 SPS=21322
bl=0 pos[1]=[37.9 6.9 0.3] dr=1.39 t=0.0ps kin=1.53 pot=22.60 Rg=28.531 SPS=21037
bl=0 pos[1]=[35.4 6.7 -0.2] dr=1.36 t=0.0ps kin=1.51 pot=22.56 Rg=27.994 SPS=20126
bl=0 pos[1]=[35.2 6.2 -0.6] dr=1.34 t=0.0ps kin=1.54 pot=22.50 Rg=27.529 SPS=20043
bl=0 pos[1]=[35.8 6.1 -0.9] dr=1.37 t=0.0ps kin=1.55 pot=22.42 Rg=27.110 SPS=19473
bl=0 pos[1]=[36.2 4.8 -1.7] dr=1.35 t=0.0ps kin=1.57 pot=22.42 Rg=26.666 SPS=20077
bl=0 pos[1]=[34.2 3.6 -2.7] dr=1.37 t=0.0ps kin=1.53 pot=22.38 Rg=26.215 SPS=19428
bl=0 pos[1]=[32.5 3.4 -1.7] dr=1.37 t=0.0ps kin=1.55 pot=22.35 Rg=25.809 SPS=19760
bl=0 pos[1]=[31.7 4.7 0.5] dr=1.36 t=0.0ps kin=1.58 pot=22.31 Rg=25.365 SPS=19803
bl=0 pos[1]=[31.4 4.6 0.8] dr=1.34 t=0.0ps kin=1.47 pot=22.30 Rg=24.931 SPS=19562
bl=0 pos[1]=[32.0 3.3 1.3] dr=1.29 t=0.0ps kin=1.48 pot=22.20 Rg=24.557 SPS=19850
bl=0 pos[1]=[30.8 4.8 0.4] dr=1.27 t=0.0ps kin=1.48 pot=22.19 Rg=24.223 SPS=19773
bl=0 pos[1]=[30.3 5.9 -0.9] dr=1.27 t=0.0ps kin=1.45 pot=22.17 Rg=23.912 SPS=19531
bl=0 pos[1]=[29.3 6.3 -2.4] dr=1.24 t=0.0ps kin=1.51 pot=22.17 Rg=23.635 SPS=19462
bl=0 pos[1]=[27.8 7.0 -3.2] dr=1.29 t=0.0ps kin=1.52 pot=22.16 Rg=23.327 SPS=19794
bl=0 pos[1]=[26.9 6.3 -4.4] dr=1.34 t=0.0ps kin=1.56 pot=22.08 Rg=22.943 SPS=19187
bl=0 pos[1]=[25.5 6.3 -4.0] dr=1.35 t=0.0ps kin=1.54 pot=22.10 Rg=22.564 SPS=18919
bl=0 pos[1]=[24.4 5.4 -4.3] dr=1.30 t=0.0ps kin=1.54 pot=22.05 Rg=22.241 SPS=18440
bl=0 pos[1]=[25.0 4.8 -5.0] dr=1.30 t=0.0ps kin=1.50 pot=22.01 Rg=21.919 SPS=19122
bl=0 pos[1]=[24.6 4.5 -4.1] dr=1.28 t=0.0ps kin=1.50 pot=21.99 Rg=21.583 SPS=18991
bl=0 pos[1]=[25.8 4.4 -2.9] dr=1.31 t=0.0ps kin=1.54 pot=21.96 Rg=21.229 SPS=18880
bl=0 pos[1]=[25.0 2.8 -2.6] dr=1.30 t=0.0ps kin=1.51 pot=21.96 Rg=20.889 SPS=18808
bl=0 pos[1]=[23.7 1.5 -2.8] dr=1.34 t=0.0ps kin=1.53 pot=21.94 Rg=20.565 SPS=18725
bl=0 pos[1]=[23.0 1.1 -4.6] dr=1.33 t=0.0ps kin=1.58 pot=21.93 Rg=20.253 SPS=18933
bl=0 pos[1]=[23.7 1.2 -6.1] dr=1.30 t=0.0ps kin=1.54 pot=21.93 Rg=19.974 SPS=18705
bl=0 pos[1]=[23.7 2.1 -4.8] dr=1.29 t=0.0ps kin=1.55 pot=21.91 Rg=19.706 SPS=19759
bl=0 pos[1]=[24.8 3.6 -4.2] dr=1.29 t=0.0ps kin=1.59 pot=21.90 Rg=19.438 SPS=20760
bl=0 pos[1]=[23.9 3.7 -4.2] dr=1.30 t=0.0ps kin=1.55 pot=21.93 Rg=19.167 SPS=21955
bl=0 pos[1]=[22.8 4.1 -3.1] dr=1.29 t=0.0ps kin=1.57 pot=21.89 Rg=18.924 SPS=19747
bl=0 pos[1]=[22.8 3.5 -2.3] dr=1.29 t=0.0ps kin=1.54 pot=21.84 Rg=18.697 SPS=18997
bl=0 pos[1]=[22.2 0.7 -3.3] dr=1.30 t=0.0ps kin=1.51 pot=21.83 Rg=18.461 SPS=18783
bl=0 pos[1]=[19.7 -1.3 -2.3] dr=1.32 t=0.0ps kin=1.58 pot=21.83 Rg=18.232 SPS=18798
bl=0 pos[1]=[18.5 -0.1 -0.7] dr=1.30 t=0.0ps kin=1.51 pot=21.81 Rg=18.067 SPS=18970
bl=0 pos[1]=[18.5 1.2 0.3] dr=1.27 t=0.0ps kin=1.52 pot=21.81 Rg=17.905 SPS=18708
bl=0 pos[1]=[19.4 1.4 0.2] dr=1.28 t=0.0ps kin=1.50 pot=21.77 Rg=17.738 SPS=19149
bl=0 pos[1]=[20.1 0.8 0.3] dr=1.29 t=0.0ps kin=1.51 pot=21.80 Rg=17.597 SPS=19050
bl=0 pos[1]=[21.8 0.5 -0.1] dr=1.30 t=0.0ps kin=1.49 pot=21.80 Rg=17.476 SPS=19346
bl=0 pos[1]=[21.3 1.7 0.8] dr=1.29 t=0.0ps kin=1.54 pot=21.78 Rg=17.349 SPS=17803
...bl=0 pos[1]=[8.3 -5.0 6.4] dr=1.26 t=0.0ps kin=1.54 pot=21.58 Rg=11.712 SPS=18526

Some details about the output for each block performed:

bl=0 is the number of each block ran, in this case we set increment=False, so the number of steps is not accounted. pos[1]=[X,Y,Z] the spatial position for the first bead. dr=1.26 show the average positions displacement in each block (in units os sigma). t=0 the time step. in this case we set increment=False, so the number of steps is not accounted. kin=1.5 is the kinect energy of the system. pot=19.90 is the potential energy of the system. RG=7.654 is the radius of gyration in the end of each block. SPS=12312 is the steps per second of each block. A way to look how fast the computations are being performed.

Check the convergence of the radius of gyration:

[77]:

plt.plot(rg)

[77]:

[<matplotlib.lines.Line2D at 0x7f1325645dc0>]

../_images/Tutorials_Tutorial_MiChroM_Optimization_44_1.png

The next step is to remove the restraint force in order to run the sampling simulation

[78]:

sim.removeFlatBottomHarmonic()

Initiate the optimization object. In this tutorial, this object is named “opt”

[79]:

opt = CustomMiChroMTraining(ChromSeq="input/seq_chr10_100k.txt",
                            mu=3.22, rc = 1.78)

The next step is to perform a long simulation to feed the optimization parameters.

In order to get a good inversion calcultation, it is important to have around \(1\times10^5\) frames from different replicas. For example, \(20\) replicas of \(5000\) saved frames.

This can take some time, so in this tutorial we will use just 1 replica of \(5000\) frames saved every \(1000\) steps.

\(block = 1000\) \(n\_blocks = 5000\)

[80]:

block = 1000
n_blocks = 5000

[ ]:

for _ in range(n_blocks):
    sim.runSimBlock(block, increment=True) #perform 1 block of the simulation
    opt.probCalculation_types(sim.getPositions()) #feed the optimization with the last position

In the end of each replica simulation we need to save some important values required to calculate the inversion.

We are saving these files using the H5 compression because it is faster to write/read.

Note: attention to this step. We have these 4 files for each replica for each iteration step. Be organized!

[82]:

rep=1

with h5py.File(sim.folder + "/polds_type_" + str(rep)+".h5", 'w') as hf:
    hf.create_dataset("polds_type",  data=opt.polds_type)

with h5py.File(sim.folder + "/Bij_type_" + str(rep)+".h5", 'w') as hf:
    hf.create_dataset("Bij_type",  data=opt.Bij_type)

with h5py.File(sim.folder + "/Nframes_" + str(rep)+".h5", 'w') as hf:
    hf.create_dataset("Nframes",  data=opt.Nframes)

with h5py.File(sim.folder + "/Pold_" + str(rep)+".h5", 'w') as hf:
    hf.create_dataset("Pold",  data=opt.Pold)

The first part of the optimization is finished. Inside the output folder, for each iteration, there are these 4 files used in next step.

[83]:

%%bash
ls iteration_0/*.h5

iteration_0/Bij_type_1.h5
iteration_0/Nframes_1.h5
iteration_0/Pold_1.h5
iteration_0/polds_type_1.h5

The second part is the inversion, it is quite simple, just feed the optmization object with all replicas and make the inversion to get the new lambda file.

[84]:

opt2 = CustomMiChroMTraining(ChromSeq="input/seq_chr10_100k.txt",
                            mu=3.22, rc = 1.78)

[85]:

iterations = "iteration_0"
replicas=[1]
for replica in replicas:
    with h5py.File(iterations + "/polds_type_" + str(replica)+".h5", 'r') as hf:
        opt2.polds_type += hf['polds_type'][:]

    with h5py.File(iterations + "/Bij_type_" + str(replica)+".h5", 'r') as hf:
        opt2.Bij_type += hf['Bij_type'][:]

    with h5py.File(iterations + "/Nframes_" + str(replica)+".h5", 'r') as hf:
        opt2.Nframes +=hf['Nframes'][()]

    with h5py.File(iterations + "/Pold_" + str(replica)+".h5", 'r') as hf:
        opt2.Pold += hf['Pold'][:]

With the parameters fed with all replicas, we calculate the inversion and get the new lambdas

[86]:

lambdas = opt2.getLamb_types(exp_map="input/chr10_100k.dense")
print(lambdas)

[[823304.95121223 934817.42279105]
 [934817.42279105 398513.55099684]]

Now load the old lambda value to calculate the new (lambda_1, in this case) according the Newton Method:

\(\lambda_1 = \lambda_0 - \delta\times\lambda_{actual}\)

\(\delta\) is the learning rate or damp, it can be adjusted in order to get values between [-1:0]. The average value of MiChroM’s parameters for Human GM12878 is -0.3

[87]:

old = pd.read_csv("input/lambda_0", sep=None, engine='python')
lambda_old = old.values
seq = old.columns
print(lambda_old)

[[0 0]
 [0 0]]

[92]:

damp = 3*10**-7
lambda_new = lambda_old - damp*lambdas

[93]:

print(lambda_new)

[[-0.24699149 -0.28044523]
 [-0.28044523 -0.11955407]]

Save the new lambda file (lambda_1.txt) and some files to analyze the inversion

[94]:

iteration = 0
#prob of A/B in sim and exp
phi_sim = opt2.calc_sim_phi_types().ravel()
phi_exp = opt2.calc_exp_phi_types().ravel()
np.savetxt('phi_sim_' + str(iteration), phi_sim)
np.savetxt('phi_exp', phi_exp)

plt.plot(phi_sim, 'o-', label="sim")
plt.plot(phi_exp, 'o-', label="exp")
plt.legend()

#HiC_simulate
dense_sim = opt2.getHiCSim()
np.savetxt('hic_sim_' + str(iteration)+'.dense', dense_sim)
plt.matshow(dense_sim, norm=mpl.colors.LogNorm(vmin=0.0001, vmax=dense_sim.max()),cmap="Reds")

#save the new lambda file
lamb = pd.DataFrame(lambda_new,columns=seq)
lamb.to_csv("input/lambda_1", index=False)

../_images/Tutorials_Tutorial_MiChroM_Optimization_66_0.png

../_images/Tutorials_Tutorial_MiChroM_Optimization_66_1.png

Redo these steps using the new lambda file (lambda_1) as input for Types potential in the next iteration.

The error can be calculate using phi_sim_1 and phi_exp by the equation: \(error = \frac{|\phi_{sim}-\phi_{exp}|}{\phi_{exp}}\)

This is appended in the file “error_and_pearsonC_types”

[95]:

%%bash
cat error_and_pearsonC_types

Error: 0.607774  Pearson's Correlation: 0.814562
Error: 0.615568  Pearson's Correlation: 0.854520
Error: 0.617701  Pearson's Correlation: 0.854046
Error: 0.619801  Pearson's Correlation: 0.854452
Error: 0.621203  Pearson's Correlation: 0.853993

We included a folder named “scripts” that has some .py and .slurm files that can be used to run this optimization in parallel using slurm clusters.