OpenMiChroM¶
OpenMiChroM.ChromDynamics¶
The ChromDynamics
classes perform chromatin dynamics based on the compartment annotations sequence of chromosomes. The simulations can be performed either using the default parameters of MiChroM (Minimal Chromatin Model) or using custom values for the type-to-type and Ideal Chromosome parameters..
- class OpenMiChroM.ChromDynamics.MiChroM(time_step=0.01, collision_rate=0.1, temperature=1.0, verbose=False, velocity_reinitialize=True, name='Chromosome', length_scale=1.0, mass_scale=1.0)[source]¶
Bases:
object
The
MiChroM
class performs chromatin dynamics employing the default MiChroM energy function parameters for the type-to-type and Ideal Chromosome interactions.Details about the MiChroM (Minimal Chromatin Model) energy function and the default parameters are decribed in “Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173.”
The
MiChroM
sets the environment to start the chromatin dynamics simulations.- Parameters
time_step (float, required) – Simulation time step in units of \(\tau\). (Default value = 0.01).
collision_rate (float, required) – Friction/Damping constant in units of reciprocal time (\(1/\tau\)). (Default value = 0.1).
temperature (float, required) – Temperature in reduced units. (Default value = 1.0).
verbose (bool, optional) – Whether to output the information in the screen during the simulation. (Default value:
False
).velocity_reinitialize (bool, optional) – Reset/Reinitialize velocities if \(E_{kin}\) is greater than 5.0. (Default value:
True
).name (str) – Name used in the output files. (Default value: Chromosome).
length_scale (float, required) – Length scale used in the distances of the system in units of reduced length \(\sigma\). (Default value = 1.0).
mass_scale (float, required) – Mass scale used in units of \(\mu\). (Default value = 1.0).
- addAdditionalForce(forceFunction, **args)[source]¶
Add an additional force after the system has already been initialized.
- Parameters
forceFunciton (function, required) – Force function to be added. Example: addSphericalConfinementLJ
**args (collection of arguments, required) – Arguments of the function to add the force. Consult respective documentation.
- addAngles(ka=2.0)[source]¶
Adds an angular potential between bonds connecting beads \(i − 1, i\) and \(i, i + 1\) according to “Halverson, J.D., Lee, W.B., Grest, G.S., Grosberg, A.Y. and Kremer, K., 2011. Molecular dynamics simulation study of nonconcatenated ring polymers in a melt. I. Statics. The Journal of chemical physics, 134(20), p.204904”.
- Parameters
ka (float, required) – Angle potential coefficient. (Default value = 2.0).
- addBond(i, j, distance=None, kfb=30)[source]¶
Adds bonds between loci \(i\) and \(j\)
- Parameters
kfb (float, required) – Bond coefficient. (Default value = 30.0).
i (int, required) – Locus index i.
j (int, required) – Locus index j
This function assigns active force to monomers, as defined by the act_seq array. :param act_seq: This list should contain the same number of elements as the number of monomers, where the value at index i corresponds to the active force F for the bead i :type act_seq: list, required :param atol: tolerance for force values. If force is less than atol then it is ignored. :type atol: float, optional
Returns Error if act_seq is not the same length as the sequence file.
- addCustomIC(mu=3.22, rc=1.78, dinit=3, dend=200, IClist=None, CutoffDistance=3.0)[source]¶
Adds the Ideal Chromosome potential using custom values for interactions between beads separated by a genomic distance \(d\). The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
dinit (int, required) – The first neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential. (Default value = 3).
dend (int, required) – The last neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential. (Default value = 200).
IClist (file, optional) – A one-column text file containing the energy interaction values for loci i and j separated by a genomic distance \(d\). (Default value:
None
).
- addCustomTypes(name='CustomTypes', mu=3.22, rc=1.78, TypesTable=None, CutoffDistance=3.0)[source]¶
Adds the type-to-type potential using custom values for interactions between the chromatin types. The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
The function receives a txt/TSV/CSV file containing the upper triangular matrix of the type-to-type interactions. A file example can be found here.
- Parameters
name (string, required) – Name to customType Potential. (Default value = “CustomTypes”)
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
TypesTable (file, required) – A txt/TSV/CSV file containing the upper triangular matrix of the type-to-type interactions. (Default value:
None
).
- addFENEBonds(kfb=30.0)[source]¶
Adds FENE (Finite Extensible Nonlinear Elastic) bonds between neighbor loci \(i\) and \(i+1\) according to “Halverson, J.D., Lee, W.B., Grest, G.S., Grosberg, A.Y. and Kremer, K., 2011. Molecular dynamics simulation study of nonconcatenated ring polymers in a melt. I. Statics. The Journal of chemical physics, 134(20), p.204904”.
- Parameters
kfb (float, required) – Bond coefficient. (Default value = 30.0).
- addFlatBottomHarmonic(kr=0.005, n_rad=10.0)[source]¶
Sets a Flat-Bottom Harmonic potential to collapse the chromosome chain inside the nucleus wall. The potential is defined as: \(step(r-r0) * (kr/2)*(r-r0)^2\).
- Parameters
kr (float, required) – Spring constant. (Default value = 5e-3).
n_rad (float, required) – Nucleus wall radius in units of \(\sigma\). (Default value = 10.0).
- addHarmonicBond_ij(i, j, r0=1.0, kfb=30)[source]¶
Internal function used to add bonds between i and j monomers :param i: monomers to be bonded :type i: int, required :param j: monomers to be bonded :type j: int, required :param kfb: bond stiffness :type kfb: float, required :param r0: equilibrium distance for the bond :type r0: float, required
- addHarmonicBonds(kfb=30.0, r0=1.0)[source]¶
This function adds harmonic bonds to all the monomers within a chain :param kfb: bond stiffness :type kfb: float, required :param r0: equilibrium distance for the bond :type r0: float, required
- addIdealChromosome(mu=3.22, rc=1.78, Gamma1=- 0.03, Gamma2=- 0.351, Gamma3=- 3.727, dinit=3, dend=500, CutoffDistance=3.0)[source]¶
Adds the Ideal Chromosome potential for interactions between beads separated by a genomic distance \(d\) according to the MiChroM energy function parameters reported in “Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173”.
The set of parameters \(\{\gamma_d\}\) of the Ideal Chromosome potential is fitted in a function: \(\gamma(d) = \frac{\gamma_1}{\log{(d)}} +\frac{\gamma_2}{d} +\frac{\gamma_3}{d^2}\).
The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
Gamma1 (float, required) – Ideal Chromosome parameter. (Default value = -0.030).
Gamma2 (float, required) – Ideal Chromosome parameter. (Default value = -0.351).
Gamma3 (float, required) – Ideal Chromosome parameter. (Default value = -3.727).
dinit (int, required) – The first neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential. (Default value = 3).
dend (int, required) – The last neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential. (Default value = 500).
- addLoops(mu=3.22, rc=1.78, X=- 1.61299, looplists=None)[source]¶
Adds the Loops interactions according to the MiChroM energy function parameters reported in “Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173”.
The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
Note
For Multi-chain simulations, the ordering of the loop list files is important! The order of the files should be the same as used in the other functions.
- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
X (float, required) – Loop interaction parameter. (Default value = -1.612990).
looplists (file, optional) – A two-column text file containing the index i and j of a loci pair that form loop interactions. (Default value:
None
).
- addMultiChainIC(mu=3.22, rc=1.78, Gamma1=- 0.03, Gamma2=- 0.351, Gamma3=- 3.727, dinit=3, dend=500, chainIndex=0, CutoffDistance=3.0)[source]¶
Adds the Ideal Chromosome potential for multiple chromosome simulations. The interactions between beads separated by a genomic distance \(d\) is applied according to the MiChroM energy function parameters reported in “Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173”.
The set of parameters \(\{\gamma_d\}\) of the Ideal Chromosome potential is fitted in a function: \(\gamma(d) = \frac{\gamma_1}{\log{(d)}} +\frac{\gamma_2}{d} +\frac{\gamma_3}{d^2}\).
The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
Gamma1 (float, required) – Ideal Chromosome parameter. (Default value = -0.030).
Gamma2 (float, required) – Ideal Chromosome parameter. (Default value = -0.351).
Gamma3 (float, required) – Ideal Chromosome parameter. (Default value = -3.727).
dinit (int, required) – The first neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential. (Default value = 3).
dend (int, required) – The last neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential. (Default value = 500).
chainIndex (integer, required) – The index of the chain to add the Ideal Chromosome potential. All chains are stored in
self.chains
. (Default value:0
).
- addRepulsiveSoftCore(Ecut=4.0, CutoffDistance=3.0)[source]¶
Adds a soft-core repulsive interaction that allows chain crossing, which represents the activity of topoisomerase II. Details can be found in the following publications:
Oliveira Jr., A.B., Contessoto, V.G., Mello, M.F. and Onuchic, J.N., 2021. A scalable computational approach for simulating complexes of multiple chromosomes. Journal of Molecular Biology, 433(6), p.166700.
Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173.
Naumova, N., Imakaev, M., Fudenberg, G., Zhan, Y., Lajoie, B.R., Mirny, L.A. and Dekker, J., 2013. Organization of the mitotic chromosome. Science, 342(6161), pp.948-953.
- Parameters
Ecut (float, required) – Energy cost for the chain passing in units of \(k_{b}T\). (Default value = 4.0).
- addSelfAvoidance(Ecut=4.0, k_rep=20.0, r0=1.0)[source]¶
This adds Soft-core self avoidance between all non-bonded monomers. This force is well behaved across all distances (no diverging parts) :param Ecut: energy associated with full overlap between monomers :type Ecut: float, required :param k_rep: steepness of the sigmoid repulsive potential :type k_rep: float, required :param r0: distance from the center at which the sigmoid is half as strong :type r0: float, required
- addSphericalConfinementLJ(r='density', density=0.1)[source]¶
Sets the nucleus wall potential according to MiChroM Energy function. The confinement potential describes the interaction between the chromosome and a spherical wall.
- Parameters
r (float or str="density", optional) – Radius of the nucleus wall. If r=”density” requires a density value.
density (float, required if r=”density”) – Density of the chromosome beads inside the nucleus. (Default value = 0.1).
- addTypetoType(mu=3.22, rc=1.78)[source]¶
Adds the type-to-type interactions according to the MiChroM energy function parameters reported in “Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173”.
The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
- chromRG()[source]¶
Calculates the Radius of Gyration of a chromosome chain.
- Returns
Returns the Radius of Gyration in units of \(\sigma\)
- createLine(ChromSeq)[source]¶
Creates a straight line for the initial configuration of the chromosome polymer.
- Parameters
ChromSeq (file, required) – Chromatin sequence of types file. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB).
length_scale (float, required) – Length scale used in the distances of the system in units of reduced length \(\sigma\). (Default value = 1.0).
- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- createRandomWalk(ChromSeq=None)[source]¶
Creates a chromosome polymer chain with beads position based on a random walk.
Args:
- ChromSeq (file, required):
Chromatin sequence of types file. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB).
- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- createSpringSpiral(ChromSeq=None, isRing=False)[source]¶
Creates a spring-spiral-like shape for the initial configuration of the chromosome polymer.
- Parameters
ChromSeq (file, required) – Chromatin sequence of types file. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB).
isRing (bool, optional) – Whether the chromosome chain is circular or not (Used to simulate bacteria genome, for example). f
bool(isRing)
isTrue
, the first and last particles of the chain are linked, forming a ring. (Default value =False
).
- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- getLoops(looplists)[source]¶
Get the loop position (CTFC anchor points) for each chromosome.
Note
For Multi-chain simulations, the ordering of the loop list files is important! The order of the files should be the same as used in the other functions.
- Parameters
looplists (text file) – A two-column text file containing the index i and j of a loci pair that form loop interactions.
- getScaledData()[source]¶
Internal function for keeping the system in the simulation box if PBC is employed.
- getVelocities()[source]¶
- Returns
Returns an array of velocities.
- Return type
\((N, 3)\)
numpy.ndarray
- initStorage(filename, mode='w')[source]¶
Initializes the .cndb files to store the chromosome structures.
- Parameters
filename (str, required) – Filename of the cndb/h5dict storage file.
mode (str, required) –
‘w’ - Create file, truncate if exists. (Default value = w).
’w-‘ - Create file, fail if exists.
’r+’ - Continue saving the structures in the same file that must exist.
- initStructure(mode='auto', CoordFiles=None, ChromSeq=None, isRing=False)[source]¶
Creates the coordinates for the initial configuration of the chromosomal chains and sets their sequence information.
Args:
- mode (str, required):
‘auto’ - Creates a spring-spiral-like shape when a CoordFiles is not provided. If CoordFiles is provided, it loads the respective type of coordinate files (.ndb, .gro, or .pdb). (Default value = ‘auto’).
‘line’ - Creates a straight line for the initial configuration of the chromosome polymer. Can only be used to create single chains.
‘spring’ - Creates a spring-spiral-like shape for the initial configuration of the chromosome polymer. Can only be used to create single chains.
‘random’ - Creates a chromosome polymeric chain with beads positions based on a random walk. Can only be used to create single chains.
‘ndb’ - Loads a single or multiple .ndb files and gets the position and types of the chromosome beads.
‘pdb’ - Loads a single or multiple .pdb files and gets the position and types of the chromosome beads.
‘gro’ - Loads a single or multiple .gro files and gets the position and types of the chromosome beads.
- CoordFiles (list of files, optional):
List of files with xyz information for each chromosomal chain. Accepts .ndb, .pdb, and .gro files. All files provided in the list must be in the same file format.
- ChromSeq (list of files, optional):
List of files with sequence information for each chromosomal chain. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB). If the chromatin types considered are different from the ones used in the original MiChroM (A1, A2, B1, B2, B3, B4, and NA), the sequence file must be provided when loading .pdb or .gro files, otherwise, all the chains will be defined with ‘NA’ type. For the .ndb files, the sequence used is the one provided in the file.
- isRing (bool, optional):
Whether the chromosome chain is circular or not (used to simulate bacteria genome, for example). To be used with the option
'random'
. Ifbool(isRing)
isTrue
, the first and last particles of the chain are linked, forming a ring. (Default value =False
).
- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- initVelocities(mult=1.0)[source]¶
Internal function that set the locus velocity to OpenMM system.
- Parameters
mult (float, optional) – Rescale initial velocities. (Default value = 1.0).
- loadGRO(GROfiles=None, ChromSeq=None)[source]¶
Loads a single or multiple .gro files and gets position and types of the chromosome beads. Initially, the MiChroM energy function was implemented in GROMACS. Details on how to run and use these files can be found at the Nucleome Data Bank.
Contessoto, V.G., Cheng, R.R., Hajitaheri, A., Dodero-Rojas, E., Mello, M.F., Lieberman-Aiden, E., Wolynes, P.G., Di Pierro, M. and Onuchic, J.N., 2021. The Nucleome Data Bank: web-based resources to simulate and analyze the three-dimensional genome. Nucleic Acids Research, 49(D1), pp.D172-D182.
- Parameters
GROfiles (list of files, required) – List with a single or multiple files in .gro file format. (Default value:
None
).ChromSeq (list of files, optional) – List of files with sequence information for each chromosomal chain. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB). If the chromatin types considered are different from the ones used in the original MiChroM (A1, A2, B1, B2, B3, B4, and NA), the sequence file must be provided, otherwise all the chains will be defined with ‘NA’ type.
- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- loadNDB(NDBfiles=None)[source]¶
Loads a single or multiple .ndb files and gets position and types of the chromosome beads. Details about the NDB file format can be found at the Nucleome Data Bank.
Contessoto, V.G., Cheng, R.R., Hajitaheri, A., Dodero-Rojas, E., Mello, M.F., Lieberman-Aiden, E., Wolynes, P.G., Di Pierro, M. and Onuchic, J.N., 2021. The Nucleome Data Bank: web-based resources to simulate and analyze the three-dimensional genome. Nucleic Acids Research, 49(D1), pp.D172-D182.
- Parameters
NDBfiles (file, required) – Single or multiple files in .ndb file format. (Default value:
None
).- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- loadPDB(PDBfiles=None, ChromSeq=None)[source]¶
Loads a single or multiple .pdb files and gets position and types of the chromosome beads. Here we consider the chromosome beads as the carbon-alpha to mimic a protein. This trick helps to use the standard macromolecules visualization software. The type-to-residue conversion follows: {‘ALA’:0, ‘ARG’:1, ‘ASP’:2, ‘GLU’:3,’GLY’:4, ‘LEU’ :5, ‘ASN’ :6}.
- Parameters
PDBfiles (list of files, required) – List with a single or multiple files in .pdb file format. (Default value:
None
).ChromSeq (list of files, optional) – List of files with sequence information for each chromosomal chain. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB). If the chromatin types considered are different from the ones used in the original MiChroM (A1, A2, B1, B2, B3, B4, and NA), the sequence file must be provided, otherwise all the chains will be defined with ‘NA’ type.
- Returns
Returns an array of positions.
- Return type
\((N, 3)\)
numpy.ndarray
- loadStructure(filename, center=True, masses=None)[source]¶
Loads the 3D position of each bead of the chromosome polymer in the OpenMM system platform.
- Parameters
center (bool, optional) – Whether to move the center of mass of the chromosome to the 3D position
[0, 0, 0]
before starting the simulation. (Default value:True
).masses (array, optional) – Masses of each chromosome bead measured in units of \(\mu\). (Default value:
None
).
- random_ChromSeq(Nbeads)[source]¶
Creates a random sequence of chromatin types for the chromosome beads.
- Parameters
Nbeads (int, required) – Number of beads of the chromosome polymer chain. (Default value = 1000).
- Returns
Returns an 1D array of a randomized chromatin type annotation sequence.
- Return type
\((N, 1)\)
numpy.ndarray
- randomizePositions()[source]¶
Runs automatically to offset the positions if it is an integer (int) variable.
- runSimBlock(steps=None, increment=True, num=None)[source]¶
Performs a block of simulation steps.
- Parameters
steps (int, required) – Number of steps to perform in the block.
increment (bool, optional) – Whether to increment the steps counter. Typically it is set
False
during the collapse or equilibration simulations. (Default value:True
).num (int or None, required) – The number of subblocks to split the steps of the primary block. (Default value:
None
).
- saveFolder(folder)[source]¶
Sets the folder path to save data.
- Parameters
folder (str, optional) – Folder path to save the simulation data. If the folder path does not exist, the function will create the directory.
- saveStructure(filename=None, mode='auto', h5dictKey='1', pdbGroups=None)[source]¶
Save the 3D position of each bead of the chromosome polymer over the chromatin dynamics simulations.
- Parameters
filename (str, required) – Filename of the storage file.
mode (str, required) –
‘ndb’ - The Nucleome Data Bank file format to save 3D structures of chromosomes. Please see the NDB - Nucleome Data Bank. for details.
’cndb’ - The compact ndb file format to save 3D structures of chromosomes. The binary format used the hdf5 - Hierarchical Data Format to store the data. Please see the NDB server for details. (Default value = cndb).
’pdb’ - The Protein Data Bank file format. Here, the chromosome is considered to be a protein where the locus is set at the carbon alpha position. This trick helps to use the standard macromolecules visualization software.
’gro’ - The GROMACS file format. Initially, the MiChroM energy function was implemented in GROMACS. Details on how to run and use these files can be found at the Nucleome Data Bank.
’xyz’ - A XYZ file format.
- setChains(chains=[(0, None, 0)])[source]¶
Sets configuration of the chains in the system. This information is later used for adding Bonds and Angles of the Homopolymer potential.
- Parameters
chains (list of tuples, optional) – The list of chains in the format [(start, end, isRing)]. isRing is a boolean whether the chromosome chain is circular or not (Used to simulate bacteria genome, for example). The particle range should be semi-open, i.e., a chain \((0,3,0)\) links the particles \(0\), \(1\), and \(2\). If
bool(isRing)
isTrue
, the first and last particles of the chain are linked, forming a ring. The default value links all particles of the system into one chain. (Default value:[(0, None, 0)]
).
- setFibPosition(positions, returnCM=False, factor=1.0)[source]¶
Distributes the center of mass of chromosomes on the surface of a sphere according to the Fibonacci Sphere algorithm.
- Parameters
positions (\((Nbeads, 3)\)
numpy.ndarray
, required) – The array of positions of the chromosome chains to be distributed in the sphere surface.returnCM (bool, optional) – Whether to return an array with the center of mass of the chromosomes. (Default value:
False
).factor (float, optional) –
Scale coefficient to be multiplied to the radius of the nucleus, determining the radius of the sphere in which the center of mass of chromosomes will be distributed. The radius of the nucleus is calculated based on the number of beads to generate a volume density of 0.1.
\(R_{sphere} = factor * R_{nucleus}\)
- Returns
Returns an array of positions to be loaded into OpenMM using the function
loadStructure
.- \((Nchains, 3)\)
numpy.ndarray
: Returns an array with the new coordinates of the center of mass of each chain.
- \((Nchains, 3)\)
- Return type
\((Nbeads, 3)\)
numpy.ndarray
- setPositions(beadsPos, random_offset=1e-05)[source]¶
Sets the 3D position of each bead of the chromosome polymer in the OpenMM system platform.
- Parameters
beadsPos (\((N, 3)\)
numpy.ndarray
) – Array of XYZ positions for each bead (locus) in the polymer model.random_offset (float, optional) – A small increment in the positions to avoid numeral instability and guarantee that a float parameter will be used. (Default value = 1e-5).
- setup(platform='CUDA', PBC=False, PBCbox=None, GPU='default', integrator='langevin', errorTol=None, precision='mixed', deviceIndex='0')[source]¶
Sets up the parameters of the simulation OpenMM platform.
- Parameters
platform (str, optional) – Platform to use in the simulations. Opitions are CUDA, OpenCL, HIP, CPU, Reference. (Default value: CUDA).
PBC (bool, optional) – Whether to use periodic boundary conditions. (Default value:
False
).PBCbox ([float,float,float], optional) – Define size of the bounding box for PBC. (Default value:
None
).GPU (\(0\) or \(1\), optional) – Switch to another GPU. Machines with one GPU automatically select the right GPU. Machines with two or more GPUs select GPU that is less used.
integrator (str) – Integrator to use in the simulations. Options are langevin, variableLangevin, verlet, variableVerlet and, brownian. (Default value: langevin).
verbose (bool, optional) – Whether to output the information in the screen during the simulation. (Default value:
False
).deviceIndex (str, optional) – Set of Platform device index IDs. Ex: 0,1,2 for the system to use the devices 0, 1 and 2. (Use only when GPU != default)
errorTol (float, required if integrator = variableLangevin) – Error tolerance parameter for variableLangevin integrator.
OpenMiChroM.Optimization¶
The Optimization
classes perform the energy function parameters training of the chromosomes based on experimental Hi-C data.
- class OpenMiChroM.Optimization.AdamTraining(mu=2.0, rc=2.0, eta=0.01, beta1=0.9, beta2=0.999, epsilon=1e-08, it=1)[source]¶
Bases:
object
The
AdamTraining
class performs the parameters training for each selected loci pair interaction.Details about the methodology are decribed in “Zhang, Bin, and Peter G. Wolynes. “Topology, structures, and energy landscapes of human chromosomes.” Proceedings of the National Academy of Sciences 112.19 (2015): 6062-6067.”
The
AdamTraining
class receive a Hi-C matrix (text file) as input. The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 2.0).
rc (float, required) – Parameter in the probability of crosslink function. (Default value = 2.0).
eta (float, required) – Learning rate applied in each step (Default value = 0.01).
beta1 (float, required) – The hyper-parameter of Adam are initial decay rates used when estimating the first and second moments of the gradient. (Default value = 0.9).
beta2 (float, required) – The hyper-parameter of Adam are initial decay rates used when estimating the first and second moments of the gradient. (Default value = 0.999).
it (int, required) – The iteration step
- getHiCexp(HiC, centerRemove=False, centrange=[0, 0], norm=True, cutoff_low=0.0, cutoff_high=1.0, KR=False, neighbors=0)[source]¶
Receives the experimental Hi-C map (Full dense matrix) in a text format and performs the data normalization from Hi-C frequency/counts/reads to probability.
- Parameters
HiC (file, required) – Experimental Hi-C map (Full dense matrix) in a text format.
centerRemove (bool, optional) – Whether to set the contact probability of the centromeric region to zero. (Default value:
False
).centrange (list, required if centerRemove =
True
)) – Range of the centromeric region, i.e.,centrange=[i,j]
, where i and j are the initial and final beads in the centromere. (Default value =[0,0]
).cutoff (float, optional) – Cutoff value for reducing the noise in the original data. Values lower than the cutoff are considered \(0.0\).
- getLamb(Lambdas, fixedPoints=None)[source]¶
Calculates the Lagrange multipliers of each pair of interaction and returns the matrix containing the energy values for the optimization step.
- Parameters
Lambdas (file, required) – The matrix containing the energies values used to make the simulation in that step.
fixedPoints (list, optional) – List of all pairs (i,j) of interactions that will remain unchanged throughout the optimization procedure.
- Returns
Returns an updated matrix of interactions between each pair of bead.
- Return type
\((N,N)\)
numpy.ndarray
- getPars(HiC, centerRemove=False, centrange=[0, 0], cutoff='deprecate', norm=True, cutoff_low=0.0, cutoff_high=1.0, KR=False, neighbors=0)[source]¶
Receives the experimental Hi-C map (Full dense matrix) in a text format and performs the data normalization from Hi-C frequency/counts/reads to probability.
- Parameters
HiC (file, required) – Experimental Hi-C map (Full dense matrix) in a text format.
centerRemove (bool, optional) – Whether to set the contact probability of the centromeric region to zero. (Default value:
False
).centrange (list, required if centerRemove =
True
)) – Range of the centromeric region, i.e.,centrange=[i,j]
, where i and j are the initial and final beads in the centromere. (Default value =[0,0]
).cutoff (float, optional) – Cutoff value for reducing the noise in the original data. Values lower than the cutoff are considered \(0.0\).
- class OpenMiChroM.Optimization.CustomMiChroMTraining(ChromSeq='chr_beads.txt', TypesTable=None, mu=3.22, rc=1.78, cutoff=0.0, IClist=None, dinit=3, dend=200)[source]¶
Bases:
object
The
CustomMiChroMTraining
class performs the parameters training employing MiChroM (Minimal Chromatin Model) energy function.Details about the methodology are decribed in “Di Pierro, Michele, et al. “Transferable model for chromosome architecture.” Proceedings of the National Academy of Sciences 113.43 (2016): 12168-12173.”
The
CustomMiChroMTraining
class receive a Hi-C matrix (text file) as input. The parameters \(\mu\) (mi) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.CustomMiChroMTraining
optimizes the type-to-type (Types) and the Ideal Chromosome (IC) potential parameters separately.- Parameters
ChromSeq (file, required) – Chromatin sequence of types file. The first column should contain the locus index. The second column should have the locus type annotation. A template of the chromatin sequence of types file can be found at the Nucleome Data Bank (NDB).
TypesTable (file, required) – A txt/TSV/CSV file containing the upper triangular matrix of the type-to-type interactions. (Default value:
None
).mu (float, required) – Parameter in the probability of crosslink function (Default value = 3.22, for human chromosomes in interphase).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\) (Default value = 1.78, for human chromosomes in interphase).
cutoff (float, optional) – Cutoff value for reducing the noise in the original data. Values lower than the cutoff are considered \(0.0\).
IClist (file, required for Ideal Chromosome training) – A one-column text file containing the energy interaction values for loci i and j separated by a genomic distance \(d\). The list should be at least of the size \(dend-dinit\). (Default value:
None
).dinit (int, required) – The first neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential for training. (Default value = 3).
dend (int, required) – The last neighbor in sequence separation (Genomic Distance) to be considered in the Ideal Chromosome potential for training. (Default value = 200).
- calc_phi_exp_IC()[source]¶
Calculates the contact probability as a function of the genomic distance from the experimental Hi-C for the Ideal Chromosome optimization.
- calc_phi_exp_types()[source]¶
Calculates the average of the contact probability for each chromatin type (compartment annotation) from the experimental Hi-C for the Types optimization.
- calc_phi_sim_IC()[source]¶
Calculates the contact probability as a function of the genomic distance from simulations for the Ideal Chromosome optimization.
- calc_phi_sim_types()[source]¶
Calculates the average of the contact probability for each chromatin type (compartment annotation) from simulation for the Types optimization.
- get_HiC_exp(filename)[source]¶
Receives the experimental Hi-C map (Full dense matrix) in a text format and performs the data normalization from Hi-C frequency/counts/reads to probability.
- get_Pearson()[source]¶
Calculates the Pearson’s Correlation between the experimental Hi-C used as a reference for the training and the in silico Hi-C obtained from the optimization step.
- get_PiPj_sim_IC()[source]¶
Normalizes the cross term of the Hessian by the number of frames in the simulation for the Ideal Chromosome optimization.
- get_PiPj_sim_types()[source]¶
Normalizes the cross term of the Hessian by the number of frames in the simulation for the Types optimization.
- get_lambdas_IC(exp_map='file.dense', damp=3e-07, write_error=True)[source]¶
Calculates the Lagrange multipliers for the Ideal Chromosome optimization and returns a array containing the energy values for the IC optimization step. :param exp_map: The experimental Hi-C map with the .dense file. (Default value:
file.dense
). :type exp_map: file, required :param damp: The learning parameter for the new lambda. (Default value = \(3*10**-7\)). :type damp: float :param dmax: The maximum distance in the sequence separation (Genomic Distance) to be considered for the convergence of the potential interations. (Default value = 200).The learning parameter for the new lambda. (Default value = \(3*10**-7\)).
- Parameters
write_error (boolean) – Flag to write the tolerance and Pearson’s correlation values. (Default value:
True
).
- get_lambdas_types(exp_map, damp=5e-07, write_error=True)[source]¶
Calculates the Lagrange multipliers of each type-to-type interaction and returns the matrix containing the energy values for the optimization step.
- class OpenMiChroM.Optimization.FullTraining(expHiC, mu=2.0, rc=2.5, cutoff=0.0, reduce=True, pair_h=2, c_h=0.1, pair_l=4, c_l=0.02)[source]¶
Bases:
object
The
FullTraining
class performs the parameters training for each selected loci pair interaction.Details about the methodology are decribed in “Zhang, Bin, and Peter G. Wolynes. “Topology, structures, and energy landscapes of human chromosomes.” Proceedings of the National Academy of Sciences 112.19 (2015): 6062-6067.”
The
FullTraining
class receive a Hi-C matrix (text file) as input. The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 2.0).
rc (float, required) – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 2.5).
cutoff (float, optional) – Cutoff value for reducing the noise in the original data. Values lower than the cutoff are considered \(0.0\).
reduce (bool, optional) – Whether to reduce the number of interactions to be considered in the inversion. If False, it will consider every possible interaction \((N*(N-1)/2)\). If True, it is necessary to give values for the lower and higher cutoffs. (Default value:
True
).pair_h (int, required if reduce =
True
) – Loci selection to apply the high-resolution cutoff. If pair_h = 2, the interaction in the high-resolution index grid \(2 : 2 : N × 2:2:N\) are subject to a cutoff value c_h, where N is the total number of monomers interactions (Default value = 2).c_h (float, required if reduce =
True
)) – The the high-resolution cutoff. (Default value = 0.1).pair_l (int, required if reduce =
True
)) – Loci selection to apply the high-resolution cutoff. If pair_l = 4, the interaction in the low-resolution index grid \(1:4:N×1:4:N\) are subject to a cutoff value c_l, where N is the total number of monomers interactions (Default value = 4).c_l (float, required if reduce =
True
)) – The the low-resolution cutoff. (Default value = 0.02).
- appCutoff(pair_h, c_h, pair_l, c_l)[source]¶
Applies the cutoff for low- and high-resolution values.
- getLambdas()[source]¶
Calculates the Lagrange multipliers of each selected interaction and returns the matrix containing the energy values for the optimization step.
- getPearson()[source]¶
Calculates the Pearson’s Correlation between the experimental Hi-C used as a reference for the training and the in silico Hi-C obtained from the optimization step.
- get_indices(hic)[source]¶
Receives non-zero interaction indices, i.e., the loci pair i and j which interaction will be optimized.
OpenMiChroM.CndbTools¶
The cndbTools
class perform analysis from cndb or ndb - (Nucleome Data Bank) file format for storing an ensemble of chromosomal 3D structures.
Details about the NDB/CNDB file format can be found at the Nucleome Data Bank.
- class OpenMiChroM.CndbTools.cndbTools[source]¶
Bases:
object
- compute_Chirality(xyz, neig_beads=4)[source]¶
Calculates the Chirality parameter \(\Psi\). Details are decribed in “Zhang, B. and Wolynes, P.G., 2016. Shape transitions and chiral symmetry breaking in the energy landscape of the mitotic chromosome. Physical review letters, 116(24), p.248101.”
- Parameters
xyz (\((frames, beadSelection, XYZ)\)
numpy.ndarray
, required) – Array of the 3D position of the selected beads for different frames extracted by using the :code: xyz() function.neig_beads (int, required) – Number of neighbor beads to consider in the calculation (Default value = 4).
- Returns
Returns the Chirality parameter \(\Psi\) for each bead.
- Return type
numpy.ndarray
- compute_GyrTensorEigs(xyz)[source]¶
Calculates the eigenvalues of the Gyration tensor: For a cloud of N points with positions: {[xi,yi,zi]},gyr tensor is a symmetric matrix defined as,
- gyr= (1/N) * [[sum_i(xi-xcm)(xi-xcm) sum_i(xi-xcm)(yi-ycm) sum_i(xi-xcm)(zi-zcm)],
[sum_i(yi-ycm)(xi-xcm) sum_i(yi-ycm)(yi-ycm) sum_i(yi-ycm)(zi-zcm)], [sum_i(zi-zcm)(xi-xcm) sum_i(zi-zcm)(yi-ycm) sum_i(zi-zcm)(zi-zcm)]]
the three non-negative eigenvalues of gyr are used to define shape parameters like radius of gyration, asphericity, etc
- Parameters
xyz (:math:`(frames, beadSelection, XYZ) – TxNx3), required): Array of the 3D position of the selected beads for different frames extracted by using the :code: xyz() function.
- Returns
- Tx3):
Returns the sorted eigenvalues of the Gyration Tensor.
- Return type
numpy.ndarray
(dim
- compute_MSD(xyz)[source]¶
Calculates the Mean-Squared Displacement using Fast-Fourier Transform. Uses Weiner-Kinchin theorem to compute the autocorrelation, and a recursion realtion from the following reference: see Sec. 4.2 in Calandrini V, et al. (2011) EDP Sciences (https://doi.org.10.1051/sfn/201112010). Also see this stackoverflow post: https://stackoverflow.com/questions/34222272/computing-mean-square-displacement-using-python-and-fft
- Parameters
xyz (:math:`(frames, beadSelection, XYZ) – TxNx3), required): Array of the 3D position of the selected beads for different frames extracted by using the :code: xyz() function.
- Returns
- NxT):
Returns the MSD of each particle over the trajectory.
- Return type
numpy.ndarray
(dim
- compute_RDP(xyz, radius=20.0, bins=200)[source]¶
Calculates the RDP - Radial Distribution Probability. Details can be found in the following publications:
Oliveira Jr., A.B., Contessoto, V.G., Mello, M.F. and Onuchic, J.N., 2021. A scalable computational approach for simulating complexes of multiple chromosomes. Journal of Molecular Biology, 433(6), p.166700.
Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G. and Onuchic, J.N., 2016. Transferable model for chromosome architecture. Proceedings of the National Academy of Sciences, 113(43), pp.12168-12173.
- Parameters
xyz (\((frames, beadSelection, XYZ)\)
numpy.ndarray
, required) – Array of the 3D position of the selected beads for different frames extracted by using the :code: xyz() function.radius (float, required) – Radius of the sphere in units of \(\sigma\) to be considered in the calculations. The radius value should be modified depending on your simulated chromosome length. (Default value = 20.0).
bins (int, required) – Number of slices to be considered as spherical shells. (Default value = 200).
- Returns
Returns the radius of each spherical shell in units of \(\sigma\). \((N, 1)\)
numpy.ndarray
:Returns the RDP - Radial Distribution Probability for each spherical shell.
- Return type
\((N, 1)\)
numpy.ndarray
- compute_RG(xyz)[source]¶
Calculates the Radius of Gyration.
- Parameters
xyz (:math:`(frames, beadSelection, XYZ) – TxNx3), required): Array of the 3D position of the selected beads for different frames extracted by using the :code: xyz() function.
- Returns
- Tx1):
Returns the Radius of Gyration in units of \(\sigma\).
- Return type
numpy.ndarray
(dim
- compute_RadNumDens(xyz, dr=1.0, ref='centroid', center=None)[source]¶
Calculates the radial number density of monomers; which when integrated over the volume (with the appropriate kernel: 4*pi*r^2) gives the total number of monomers.
- Parameters
xyz (:math:`(frames, beadSelection, XYZ) – TxNx3), required): Array of the 3D position of the selected beads for different frames extracted by using the :code: xyz() function.
dr (float, required) – mesh size of radius for calculating the radial distribution. can be arbitrarily small, but leads to empty bins for small values. bins are computed from the maximum values of radius and dr.
ref (string) –
defines reference for centering the disribution. It can take three values:
’origin’: radial distance is calculated from the center
’centroid’ (default value): radial distributioin is computed from the centroid of the cloud of points at each time step
’custom’: user defined center of reference. ‘center’ is required to be specified when ‘custom’ reference is chosen
center (list of float, len 3) – defines the reference point in custom reference. required when ref=’custom’
- Returns
the number density
- bins:class:numpy.ndarray:
bins corresponding to the number density
- Return type
num_density:class:numpy.ndarray
- load(filename)[source]¶
Receives the path to cndb or ndb file to perform analysis.
- Parameters
filename (file, required) – Path to cndb or ndb file. If an ndb file is given, it is converted to a cndb file and saved in the same directory.
- ndb2cndb(filename)[source]¶
Converts an ndb file format to cndb.
- Parameters
filename (path, required) – Path to the ndb file to be converted to cndb.
- traj2HiC(xyz, mu=3.22, rc=1.78)[source]¶
Calculates the in silico Hi-C maps (contact probability matrix) using a chromatin dyamics trajectory.
The parameters \(\mu\) (mu) and rc are part of the probability of crosslink function \(f(r_{i,j}) = \frac{1}{2}\left( 1 + tanh\left[\mu(r_c - r_{i,j}\right] \right)\), where \(r_{i,j}\) is the spatial distance between loci (beads) i and j.
- Parameters
mu (float, required) – Parameter in the probability of crosslink function. (Default value = 3.22).
rc – Parameter in the probability of crosslink function, \(f(rc) = 0.5\). (Default value = 1.78).
- xyz(frames=[1, None, 1], beadSelection=None, XYZ=[0, 1, 2])[source]¶
Get the selected beads’ 3D position from a cndb or ndb for multiple frames.
- Parameters
frames (list, required) – Define the range of frames that the position of the bead will get extracted. The range list is defined by
frames=[initial, final, step]
. (Default value: :code: [1,None,1], all frames)beadSelection (list of ints, required) – List of beads to extract the 3D position for each frame. The list is defined by :code: beadSelection=[0,1,2,…,N-1]. (Default value: :code: None, all beads)
XYZ (list, required) – List of the axis in the Cartesian coordinate system that the position of the bead will get extracted for each frame. The list is defined by :code: XYZ=[0,1,2]. where 0, 1 and 2 are the axis X, Y and Z, respectively. (Default value: :code: XYZ=[0,1,2])
- Returns
Returns an array of the 3D position of the selected beads for different frames.
- Return type
(\(N_{frames}\), \(N_{beads}\), 3)
numpy.ndarray