A web server for identify enhancers and their strength by word embedding representation
In genetics, an enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and then produce RNA and proteins. Enhancers are cis-acting and these RNAs or proteins are frequently alluded to as transcription factors. They can be situated up to 1 Mbp (1,000,000 bp) away from a gene, or even in different chromosome, upstream or downstream from the beginning site. There are a big number of enhancers in the human genome found in both eukaryotes and prokaryotes. Genetic variation in enhancers has been linked to a lot of human diseases, especially cancer, disorder, or inflammatory bowel disease.
Due to the importance in genomic, identification and classification of enhancers is one of the well-known studies in biological research. Our idea to transform enhancer sequences into vectors using word embedding and then classify them by using an effective neural networks. Our implementation used:
By using our webserver, users can identify the enhancers and their strength easily without any computational knowledge. They only need to provide a sequence and submit to our server, we then efficiently process and return the results. Our workflow to process an input sequences is as follows:
If you would like to build a model and evaluate our model, please follow this link to download the benchmark dataset.
Cross-validation dataset Independent datasetIn order to avoid the errors, please submit the sequence in fasta format (we also give you the fasta file examples). The user can choose two options to submit, including paste the sequence into text area and upload sequence file. The user can submit one single fasta file or multiple fasta file. In the result page, we show the results for the sequences which belong to enhancers or not as well as their strength.
>CHR12_6645339_6645539 GTGGCATAGTGGGGTGGTGAATACCATGTACAAAGCTTGTGCCCAGACTGTGGGTGGCAG TGCCCCACATGGCCGCTTCTCCTGGAAGGGCTTCGTATGACTGGGGGTGTTGGGCAGCCC TGGAGCCTTCAGTTGCAGCCATGCCTTAAGCCAGGCCAGCCTGGCAGGGAAGCTCAAGGG AGATAAAATTCAACCTCTTG >CHRX_36223479_36223679 TACAAATTTGTTAAAGAGTGGTAATCTGATAAAGATAATAAGTGCACTTTGAGGAAGGTG AGCTCTTGCTTTGAGAGTGAACTTGGTTATGGAAGTTGTGCTGAAGTTGAGGCTTAGGCT GGCTCTAAAAGAAAGGAAAAATGTTGATGGCTAGAGGTTATCAGAGGAGAAGAATGTGTT ATTGGTGAGTAACTTATCTT >CHR12_46780933_46781133 TAAAAATTTTCCCATTTCATTTATACTCCAGTTTCAGTTCCGTTGATTCAAAGTTTTGGG ACTTAGGACAAGTTTGCTAGCTTCTCTGAGCCTCAGTTTTTTAAACCGTTAAATAGGAAT AACAGCATGCTGAGTGCCAAGAATTAAAGAAAAATTTATGTGAAAATATAGATTAGAAAG AAAACTAATATAAATGCAGG
If you have any problem or suggest any idea for our website, feel free to contact us via email: [email protected]