引用本文: |
-
鲁帮力,陈庆锋,江家文,罗海琼.基于序列和结构特征的蛋白质自由能预测[J].广西科学,2017,24(3):286-291. [点击复制]
- LU Bangli,CHEN Qingfeng,JIANG Jiawen,LUO Haiqiong.Protein Free Energy Prediction based on Sequence and Structure Features[J].Guangxi Sciences,2017,24(3):286-291. [点击复制]
|
|
摘要: |
[目的]蛋白质自由能不仅能准确地反应蛋白质的交互,而且对药物设计有巨大帮助。因此,选择建立精确的蛋白质自由能回归模型是非常有必要的。[方法]收集135对蛋白质复合物并计算600个特征,通过最小冗余最大相关(mRMR)选择与蛋白质自由能显著相关的特征并去除冗余特征,从而得到最小冗余最大相关的特征集,用筛选后的特征建立6种回归模型,并对选择后的特征进行移除对比分析特征的重要性;最后通过10折交叉验证对比得到最佳模型,预测蛋白质自由能。[结果]相对于其它方法,本研究所建立的模型在预测135对蛋白质复合物的性能,相对于其它方法有着较高的相关系数和较低平均绝对误差。[结论]本实验所用方法比其他方法选出的模型有更好的预测精度。 |
关键词: 蛋白质交互 自由能 特征选择 回归模型 |
DOI:10.13656/j.cnki.gxkx.20170601.002 |
投稿时间:2017-03-25修订日期:2017-05-24 |
基金项目:国家自然科学基金项目(61363025)和广西自然科学基金重点项目(2013GXNSFDA019029)资助。 |
|
Protein Free Energy Prediction based on Sequence and Structure Features |
LU Bangli1, CHEN Qingfeng1,2, JIANG Jiawen1, LUO Haiqiong3
|
(1.School of Computer, Electronics and Information in Guangxi University, Nanning, Guangxi, 530004, China;2.State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi, 530004, China;3.School of Information and Management, Guangxi Medical University, Nanning, Guangxi, 530021, China) |
Abstract: |
[Objective] Protein free energy not only can accurately reflect the protein interaction, but also can be a great help to drug design and disease treatment. Therefore, it is necessary to establish an accurate regression model of protein free energy.[Methods] In this article, 135 proteins complexes were collected and 600 features were calculated. Minimum redundancy maximum relevance algorithm was used to select features which were significantly related to protein free energy and removed redundant features. This was able to obtain the minimum redundancy maximum relevance feature sets. The importance of features was further analyzed by comparing the performance change by removing features. The best model was chosen to predict protein free energy by comparing the result of 10-fold cross validation.[Results] The model had a higher correlation coefficient and lower average absolute error in predicting the performance of 135 pairs of protein complexes compared with other methods.[Conclusion] The experimental results show that our method has better prediction accuracy than other methods. |
Key words: protein interaction free energy feature selection regression model |