ECPred

A webserver for discriminating five electron complexes of transport proteins

Submit your proteins Download our dataset

Motivation

Every cell uses cellular respiration machinery to oxidize food molecules such as glucose (sugars) to carbon dioxide and water, thus obtaining energy-carrying molecules in the form of adenosine triphosphate. This crucial process can not occur without the aid of electron transport chains, a series of 5 protein complexes embedded in the inner mitochondrial membrane. A variety of human diseases such as Parkinson's disease, pulmonary hypertension, and Alzheimer's disease involve the functional loss of these protein complexes. Thus, investigating these electron complexes is an ongoing concern for biologists to better understand the molecular mechanisms of important human diseases. In this research, we employed two representation learning methods namely word embedding and transfer learning to analyse electron complex sequences and create efficient feature sets before using support vector machine algorithm to classify them.

Figure 1. The oxidative phosphorylation process with the participation of the 5 electron complexes.
Figure 2. The framework of our study.

Result

On an average, our final classification models achieved performance of 96%, 96.1%, 95.3%, and 0.86, respectively on cross-validation data. For the independent test data, those corresponding performance scores are 95.3%, 92.6%, 94%, and 0.87. Using representation learning methods, we show that simple machine learning method is on par with existing deep neural network method on the task of categorizing electron complexes while enjoying much faster way for feature generation.

Dataset

The dataset used in this server were from Le et al’s work which were retrieved from UniProt and GeneOntology. The detail of the dataset is listed in the table below.

Original Similarity < 30%
(BLASTCLUST)
Benchmark datasets
Testing Data Training Data
Complex I 5,147 302 252 50
Complex II 154 30 25 5
Complex III 1,781 29 25 4
Complex IV 2026 153 128 25
Complex V 2026 190 38 152

If you would like to build a model and evaluate our model, we provide the dataset as the below link

Download dataset.zip

Submission

In order to avoid the errors, please submit the sequence in fasta format (we also give you the fasta file examples). The user can choose two options to submit, including paste the sequence into text area and upload sequence file. The user can submit one single fasta file or multiple fasta file. In the result page, we show the results for the sequences separately.

Sample fasta Sequence(s)
> A2XVZ1
MATTASPFLSPAKLSLERRLPRATWTARRSVRFPPVRAQDQQQQVKEEEEEAAVENLPPP
PQEEEQRRERKTRRQGPAQPLPVQPLAESKNMSREYGGQWLSCTTRHIRIYAAYINPETN
AFDQTQMDKLTLLLDPTDEFVWTDETCQKVYDEFQDLVDHYEGAELSEYTLRLIGSDLEH
FIRKLLYDGEIKYNMMSRVLNFSMGKPRIKFNSSQIPDVK
> P31039
MSGVAAVSRLWRARRLALTCTKWSAAWQTGTRSFHFTVDGNKRSSAKVSDAISAQYPVVD
HEFDAVVVGAGGAGLRAAFGLSEAGFNTACVTKLFPTRSHTVAAQGGINAALGNMEEDNW
RWHFYDTVKGSDWLGDQDAIHYMTEQAPASVVELENYGMPFSRTEDGKIYQRAFGGQSLK
FGKGGQAHRCCCVADRTGHSLLHTLYGRSLRYDTSYFVEYFALDLLMESGECRGVIALCI
EDGSIHRIRARNTVIATGGYGRTYFSCTSAHTSTGDGTAMVTRAGLPCQDLEFVQFHPTG
IYGAGCLITEGCRGEGGILINSQGERFMERYAPVAKDLASRDVVSRSMTLEIREGRGCGP
EKDHVYLQLHHLPPAQLAMRLPGISETAMIFAGVDVTKEPIPVLPTVHYNMGGIPTNYKG
QVLRHVNGQDQGVPGLYACGEAACASVHGANRLGANSLLDLVVFGRACALSIAESCRPGD
KVPSIKPNAGEESVMNLDKLRFANGSIRTSELRLNMQKSMQSHAAVFRVGSVLQEGCEKI
SSLYGDLRHLKTFDRGMVWNTDLVETLELQNLMLCALQTIYGAEARKESRGGPRREDFKE
RVDEYDYSKPIQGQQKKPFEQHWRKHTLSYVDIKTGKVTLEYRPVIDRTLNETDCATVPP
AIGSY
> O31214
MLASAGGYWPMSAQGVNKMRRRVLVAATSVVGAVGAGYALVPFVASMNPSARARAAGAPV
EADISKLEPGALLRVKWRGKPVWVVHRSPEMLAALSSNDPKLVDPTSEVPQQPDYCKNPT
RSIKPEYLVAIGICTHLGCSPTYRPEFGPDDLGSDWKGGFHCPCHGSRFDLAARVFKNVP
APTNLVIPKHVYLNDTTILIGEDRGSA
> E0TW67
MIFLFRALKPLLVLALLTVVFVLGGCSNASVLDPKGPVAEQQSDLILLSIGFMLFIVGVV
FVLFTIILVKYRDRKGKDNGSYNPKIHGNTFLEVVWTVIPILIVIALSVPTVQTIYSLEK
APEATKDKEPLVVHATSVDWKWVFSYPEQDIETVNYLNIPVDRPILFKISSADSMASLWI
PQLGGQKYAMAGMLMDQYLQADEVGTYQGRNANFTGEHFADQEFDVNAVTEKDFNSWVKK
TQNEAPKLTKEKYDQLMLPENVDELTFSSTHLKYVDHGQDAEYAMEARKRLGYQAVSPHS
KTDPFENVKENEFKKSDDTEE
> A5GCQ9
MGYVELIAALRRDGEEQLEKIRSDAEREAERVKGDASARIERLRAEYAERLASLEAAQAR
AILADAESKASSIRLATESALAVRLFLLARSSLHHLRDEGYEQLFADLVRELPPGEWRRV
VVNPADMALAARHFPNAEIVSHPAIVGGLEVSEEGGSISVVNTLEKRMERAWPELLPEIL
RDIYREL

Members

Yu-Yen Ou
Associate Professor

Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.

Trung-Duong Nguyen-Trinh
Research Scholar

Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.

Nguyen-Quoc-Khanh Le
Research Scholar

School of Humanities
Nanyang Technological University
48 Nanyang Ave, Singapore 6397983

Dinh-Van Phan
Research Scholar

Deparment of Statistics – Informatics
University of Economics, University of Danang
71 Ngu Hanh Son St, Danang, Vietnam 550000

Quang-Thai Ho
Research Scholar

Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.

Contact us


Department of Computer Science and Engineering
Graduate Program in Biomedical Informatics
Bioinformatics Laboratory (R1607B)
Address: No. 135, Yuandong Road, Chungli City, Taoyuan County, Taiwan R.O.C .32003
Tel: (03) 463-8800

If you have any problem or suggest any idea for our website, feel free to contact us via email: [email protected]