0%

使用Modeller补缺失残基

写这个是因为每次用的时候发现都忘记了,甚至不知道网页在哪,每次都要在一堆链接中反复点击,宛如蒙特卡洛模拟,有时候甚至还找不到想要的网站。因此我现在就把使用modeller补全缺失残基的网页放在这里。

我们以4GNX为例进行补全。首先4GNX在PDB中是个二聚体结构,我们只需要其中的一半的信息,所以我们删除了X,Y,Z和L链,改文件命名为4gnx_half.pdb。之后我们需要得到pdb中的序列信息。但是modeller只会得到存在的残基的序列。对于中间缺失的残基,虽然pdb文件的REMARK 465中记录了缺失的残基序号和类型,pdb文件中的SEQRES也记录了生物分子的序列信息,但是modeller并不会帮你从pdb中提取出来在序列上补充上,因为modeller认为这部分信息是不可靠的。并且有的经过处理的pdb甚至会丢失这些信息。

我们可以使用一下代码提取pdb文件中的序列信息:

1
2
3
4
5
6
7
8
9
from modeller import *
# Get the sequence of the 4gnx PDB file, and write to an alignment file
code = '4gnx'

e = Environ()
m = Model(e, file=code)
aln = Alignment(e)
aln.append_model(m, align_codes=code)
aln.write(file=code+'.seq')
用以上代码生成的序列文件4gnx_half.seq如下:
1
2
3
4
5
6
7
8
9
10
11
12
13

>P1;4gnx_half
structureX:4gnx_half: 1 :A:+688 :K:MOL_ID 1; MOLECULE PUTATIVE UNCHARACTERIZED PROTEIN; CHAIN A, X; ENGINEERED YES; MOL_ID 2; MOLECULE PUTATIVE UNCHARACTERIZED PROTEIN; CHAIN B, Y; ENGINEERED YES; MOL_ID 3; MOLECULE PUTATIVE UNCHARACTERIZED PROTEIN; CHAIN C, Z; ENGINEERED YES; MOL_ID 4; MOLECULE DNA (25-MER); CHAIN K, L; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC USTILAGO MAYDIS; ORGANISM_COMMON SMUT FUNGUS; ORGANISM_TAXID 237631; STRAIN 521 / FGSC 9021; GENE UM04165.1; EXPRESSION_SYSTEM SPODOPTERA FRUGIPERDA; EXPRESSION_SYSTEM_TAXID 7108; EXPRESSION_SYSTEM_CELL_LINE HI5; MOL_ID 2; ORGANISM_SCIENTIFIC USTILAGO MAYDIS; ORGANISM_COMMON SMUT FUNGUS; ORGANISM_TAXID 237631; STRAIN 521 / FGSC 9021; GENE UM02579.1; EXPRESSION_SYSTEM SPODOPTERA FRUGIPERDA; EXPRESSION_SYSTEM_TAXID 7108; EXPRESSION_SYSTEM_CELL_LINE HI5; MOL_ID 3; ORGANISM_SCIENTIFIC USTILAGO MAYDIS; ORGANISM_COMMON SMUT FUNGUS; ORGANISM_TAXID 237631; STRAIN 521 / FGSC 9021; GENE UM05156.1; EXPRESSION_SYSTEM SPODOPTERA FRUGIPERDA; EXPRESSION_SYSTEM_TAXID 7108; EXPRESSION_SYSTEM_CELL_LINE HI5; MOL_ID 4; SYNTHETIC YES; ORGANISM_SCIENTIFIC SYNTHETIC; ORGANISM_TAXID 32630: 2.80: 0.22
MEKPTPLINSSMLGQYVGQTVRIVGKVHKVTGNTLLMQTSDLGNVEIAMTPDSDVSSSTFVEVTGKVSDAGSSFQ
ANQIREFTTVDVDLTLVENVVQISAAFPNLFSD/NTLRPVTIRQILNAEQPHPDAEFILDGAELGQLTFVAVVRN
ISRNATNVAYSVEDGTGQIEVRQWLDASEIRNNVYVRVLGTLKSFQNRRSISSGHMRPVIDYNEVMFHRLEAVHA
HLQVTR/IYPIEGLSPYQNRWTIKARVTSKSDIRHWSNQRGEGKLFSVNLLDDSGEIKATGFNDAVDRFYPLLQE
NHVYLISKARVNIAKKQFSNLQNEYEITFENSTEIEECTDATDVPEVKYEFVRINELESVEANQQCDVIGILDSY
GELSEIVSKASQRPVQKRELTLVDQGNRSVKLTLWGKTAETFPTNAGVDEKPVLAFKGVKVGDFGGRSLSMFSSS
TMLINPDITESHVLRGWYDNDGAHAQFQPYTNGGGAGANMAERRTIVQVKDENLGMSEKPDYFNVRATVVYIKQE
NLYYTACASEGCNKKVNLDHENNWRCEKCDRSYATPEYRYILSTNVADATGQMWLSGFNEDATQLIGMSAGELHK
LREESESEFSAALHRAANRMYMFNCRAKMDTFNDTARVRYTISRAAPVDFAKAGMELVDAIRAYM/ttttttttt
tttttttttttttttt*
REMARK 465SEQRES中我们可以知道缺失的残基是哪些,进而填入以上生成的序列中。这里我们要有一份用-填补缺失残基的模板序列和一份完整序列,这两个序列可以写在alignment.aln文件中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
>P1;4gnx_half
structureX:4gnx_half: 1 :A:+688 :K:MOL_ID 1; MOLECULE PUTATIVE UNCHARACTERIZED PROTEIN; CHAIN A, X; ENGINEERED YES; MOL_ID 2; MOLECULE PUTATIVE UNCHARACTERIZED PROTEIN; CHAIN B, Y; ENGINEERED YES; MOL_ID 3; MOLECULE PUTATIVE UNCHARACTERIZED PROTEIN; CHAIN C, Z; ENGINEERED YES; MOL_ID 4; MOLECULE DNA (25-MER); CHAIN K, L; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC USTILAGO MAYDIS; ORGANISM_COMMON SMUT FUNGUS; ORGANISM_TAXID 237631; STRAIN 521 / FGSC 9021; GENE UM04165.1; EXPRESSION_SYSTEM SPODOPTERA FRUGIPERDA; EXPRESSION_SYSTEM_TAXID 7108; EXPRESSION_SYSTEM_CELL_LINE HI5; MOL_ID 2; ORGANISM_SCIENTIFIC USTILAGO MAYDIS; ORGANISM_COMMON SMUT FUNGUS; ORGANISM_TAXID 237631; STRAIN 521 / FGSC 9021; GENE UM02579.1; EXPRESSION_SYSTEM SPODOPTERA FRUGIPERDA; EXPRESSION_SYSTEM_TAXID 7108; EXPRESSION_SYSTEM_CELL_LINE HI5; MOL_ID 3; ORGANISM_SCIENTIFIC USTILAGO MAYDIS; ORGANISM_COMMON SMUT FUNGUS; ORGANISM_TAXID 237631; STRAIN 521 / FGSC 9021; GENE UM05156.1; EXPRESSION_SYSTEM SPODOPTERA FRUGIPERDA; EXPRESSION_SYSTEM_TAXID 7108; EXPRESSION_SYSTEM_CELL_LINE HI5; MOL_ID 4; SYNTHETIC YES; ORGANISM_SCIENTIFIC SYNTHETIC; ORGANISM_TAXID 32630: 2.80: 0.22
MEKPTPLINSSMLGQYVGQTVRIVGKVHKVTGNTLLMQTSDLGNVEIAMTPDSDVSSSTFVEVTGKVSDAGSSFQ
ANQIREFTTV----DVDLTLVENVVQISAAFPNLFSD/NTLRPVTIRQILNAEQPHPDAEFILDGAELGQLTFVA
VVRNISRNATNVAYSVEDGTGQIEVRQWLD--------ASEIRNNVYVRVLGTLKSFQNRRSISSGHMRPVIDYN
EVMFHRLEAVHAHLQVTR/IYPIEGLSPYQNRWTIKARVTSKSDIRHWSNQRGEGKLFSVNLLDDSGEIKATGFN
DAVDRFYPLLQENHVYLISKARVNIAKKQFSNLQNEYEITFENSTEIEECTDATDVPEVKYEFVRINELESVEAN
QQCDVIGILDSYGELSEIVSKASQRPVQKRELTLVDQGNRSVKLTLWGKTAETFPTNAGVDEKPVLAFKGVKVGD
FGGRSLSMFSSSTMLINPDITESHVLRGWYDNDGAHAQFQPYTN---------GGGAGANMAERRTIVQVKDENL
GMSEKPDYFNVRATVVYIKQENLYYTACASEGCNKKVNLDHENNWRCEKCDRSYATPEYRYILSTNVADATGQMW
LSGFNEDATQLIGMSAGELHKLREESESEFSAALHRAANRMYMFNCRAKMDTFNDTARVRYTISRAAPVDFAKAG
MELVDAIRAYM/ttttttttttttttttttttttttt*
>P1;4gnx_half_fill
sequence:::::::::
MEKPTPLINSSMLGQYVGQTVRIVGKVHKVTGNTLLMQTSDLGNVEIAMTPDSDVSSSTFVEVTGKVSDAGSSFQ
ANQIREFTTVDCGHDVDLTLVENVVQISAAFPNLFSD/NTLRPVTIRQILNAEQPHPDAEFILDGAELGQLTFVA
VVRNISRNATNVAYSVEDGTGQIEVRQWLDSSSDDSSKASEIRNNVYVRVLGTLKSFQNRRSISSGHMRPVIDYN
EVMFHRLEAVHAHLQVTR/IYPIEGLSPYQNRWTIKARVTSKSDIRHWSNQRGEGKLFSVNLLDDSGEIKATGFN
DAVDRFYPLLQENHVYLISKARVNIAKKQFSNLQNEYEITFENSTEIEECTDATDVPEVKYEFVRINELESVEAN
QQCDVIGILDSYGELSEIVSKASQRPVQKRELTLVDQGNRSVKLTLWGKTAETFPTNAGVDEKPVLAFKGVKVGD
FGGRSLSMFSSSTMLINPDITESHVLRGWYDNDGAHAQFQPYTNGGVGGGAMGGGGAGANMAERRTIVQVKDENL
GMSEKPDYFNVRATVVYIKQENLYYTACASEGCNKKVNLDHENNWRCEKCDRSYATPEYRYILSTNVADATGQMW
LSGFNEDATQLIGMSAGELHKLREESESEFSAALHRAANRMYMFNCRAKMDTFNDTARVRYTISRAAPVDFAKAG
MELVDAIRAYM/ttttttttttttttttttttttttt*
现在我们可以用Modeller中的'LoopModel' class生成所有的残基,并对loop区域进行优化,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from modeller import *
from modeller.automodel import * # Load the AutoModel class

log.verbose()
env = Environ()

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

a = LoopModel(env, alnfile = 'alignment.ali',
knowns = '4gnx_half', sequence = '4gnx_half_fill')
a.starting_model= 1
a.ending_model = 1

a.loop.starting_model = 1
a.loop.ending_model = 2
a.loop.md_level = refine.fast

a.make()
使用该代码会生成一个使用model生成的结构(a.starting_model = 1a.ending_model = 1)和两个使用loopmodel生成的结构(a.loop.starting_model = 1a.loop.ending_model = 2)。如果我们要生成更多的结构,那么我们可以把a.ending_modela.loop.ending_model设定为更大的值。

如果你不需要对loop进行优化,那么你可以选择AutoModel代替LoopModel,同时移除与loop相关的三个参数。

使用LoopModelAutoModel补残基时,默认所有的原子都可以移动,如果你想让不缺失的残基不被移动的话,你可以设置select_atoms方法。在Modeller中,残基序号是从1开始并且按顺序加一的,因此在写residue_range有可能需要重新编号。同时residue_range是包括最后一个列出的残基的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from modeller import *
from modeller.automodel import * # Load the AutoModel class

log.verbose()
env = Environ()

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

class MyModel(AutoModel):
def select_atoms(self):
return Selection(self.residue_range('86:A', '89:A'),
self.residue_range('180:B', '187:B'),
self.residue_range('493:C', '501:C'))

a = MyModel(env, alnfile = 'alignment.ali',
knowns = '4gnx_half', sequence = '4gnx_half_fill')
a.starting_model= 1
a.ending_model = 1

a.make()

如果使用的是LoopModel,那么在使用以上的方式进行约束时,两个边界上的残基还是会被移动,因此还需要添加`select_loop_atoms进行限制。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from modeller import *
from modeller.automodel import * # Load the AutoModel class

log.verbose()
env = Environ()

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

class MyModel(LoopModel):
def select_atoms(self):
return Selection(self.residue_range('86:A', '89:A'),
self.residue_range('180:B', '187:B'),
self.residue_range('493:C', '501:C'))
def select_loop_atoms(self):
return Selection(self.residue_range('86:A', '89:A'),
self.residue_range('180:B', '187:B'),
self.residue_range('493:C', '501:C'))

a = MyModel(env, alnfile = 'alignment.ali',
knowns = '4gnx_half', sequence = '4gnx_half_fill')
a.starting_model= 1
a.ending_model = 1

a.make()