Machine-Learning Based Toxin Classification

During my research experience, I participated in a bioinformatics research topic, Machine-Learning Based Toxin Classification. The main goal of this topic is to use machine leaning methods to classify different toxins, and replace complex wet lab biological classification experiments reduce costs for pharmaceuticals.

The raw data is time series data, denoting the change curve of cell number in different toxicant solution. And our target is to classify base on these change curves. Before I worked on this topic, Dr. Yile Zhang has published a paper about using wavelet transform, SVM and neural network on this classification problem.

I Introduced a method call Shapelet Transform into this problem. It’s a shape-based classification method, and I combine it with random forest. Finally I attained 97% classification accuracy in some classification cases, which is an improvement from state-of-the-art papers, and find out a group of curve shapes that may have biological meanings.