Jun Miyake Laboratory

Foresight Deep Intelligence

Graduate School of Engineering, Osaka University​

Figure: Phylogenetic separations projected by an LAE at six training epochs and the value of the quadratic error function along the training epochs

A total of 360 human mtDNA sequences were projected onto the three-dimensional subspace learned by an LAE at the 300th epoch. The dots in each projection represent the projected sequences, and the dot colors represent different human mtDNA haplogroups (L0, blue; L3, green; M, red; N, cyan; R, violet; U, yellow). The values of the quadratic error function along the training epochs. Suffixes were placed to associate the projections with the corresponding points on the error curve. The three gray arrows indicate the principal directions in the 1024-dimensional space projected onto the three-dimensional space learned by the LAE at the displayed training epoch. PC1, PC2, and PC3 arrows (equal in length) mean the first, second, and third principal directions, respectively, projected onto the three-dimensional space learned by the LAE at respective training epochs. Video of the cluster transition from the 1st to 10,000th epochs.

 

 

 

Autoencoder for DNA sequence analysis

  We have been aiming to develop a method to outlook the nature of genomic sequences. Because the genes are consisted by sequential combination of many nucleotides, usually several hundred to thousands, it is impossible to grip the structures, nature and meanings directly by our own intellectual ability. Some special parts of the genes are used as nameplates. However, such part [in many cases as SNPs (single nucleotide polymorphisms)] might not represent the whole structure of entire sequence. The difficulty came from the fact the sequence structure is too long (large in bps) beyond our intelligence nor analytical sciences to grip instantly. Deep learning is a method to project a complex system to another complex system, which human intelligence can recognize easier.

-----------------------------------------------------

Graphical classification of DNA sequences of HLA alleles by deep learning

Human Cell volume 31, pages102–105(2018)

Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence “Deep Learning (Stacked autoencoder)”. Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

Screen Shot 2020-05-14 at 12.59.33.png

Histogram-based document vector analysis of HLA-A using autoencoder. Positions of alleles are different from Fig. 2 but each one looks much sharpened and independent from those of other alleles. The meanings of the distances and directions of alleles are under investigation but they could be correlated to the genetic differences for immune characteristics. Closer the positions should indicate the mutual similarities of the sequences