Jun Miyake Laboratory
Foresight Deep Intelligence
Graduate School of Engineering, Osaka University
Figure: Phylogenetic separations projected by an LAE at six training epochs and the value of the quadratic error function along the training epochs
A total of 360 human mtDNA sequences were projected onto the three-dimensional subspace learned by an LAE at the 300th epoch. The dots in each projection represent the projected sequences, and the dot colors represent different human mtDNA haplogroups (L0, blue; L3, green; M, red; N, cyan; R, violet; U, yellow). The values of the quadratic error function along the training epochs. Suffixes were placed to associate the projections with the corresponding points on the error curve. The three gray arrows indicate the principal directions in the 1024-dimensional space projected onto the three-dimensional space learned by the LAE at the displayed training epoch. PC1, PC2, and PC3 arrows (equal in length) mean the first, second, and third principal directions, respectively, projected onto the three-dimensional space learned by the LAE at respective training epochs. Video of the cluster transition from the 1st to 10,000th epochs.
Autoencoder for DNA sequence analysis
We have been aiming to develop a method to outlook the nature of genomic sequences. Because the genes are consisted by sequential combination of many nucleotides, usually several hundred to thousands, it is impossible to grip the structures, nature and meanings directly by our own intellectual ability. Some special parts of the genes are used as nameplates. However, such part [in many cases as SNPs (single nucleotide polymorphisms)] might not represent the whole structure of entire sequence. The difficulty came from the fact the sequence structure is too long (large in bps) beyond our intelligence nor analytical sciences to grip instantly. Deep learning is a method to project a complex system to another complex system, which human intelligence can recognize easier.
Graphical classification of DNA sequences of HLA alleles by deep learning
Human Cell volume 31, pages102–105(2018)
Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence “Deep Learning (Stacked autoencoder)”. Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.