Computational Biology | Proteomics | Data Analytics | Machine Learning | Statisical analysis | Hypothesis testing |
I am a researcher with a big interest in how data shape our lives. At the moment, I work in the field of computational biology and biophysics, which is powered by statistics analysis, machine-learning approaches, and deterministic algorithms. I am experienced in proteomics, databases, machine learning algorithms, neural networks,data analysis pipelines and data visualization using Python, R and Tableau. I have a significant interest in fields such as energy and business sector, where the data-driven products can enhance the existing decision-making. I am motivated by digging into data. Give me a dataset and questions, I’m eager to figure out what’s driving the numbers. I love to solve problems with code and self-learn new things. I particularly fancy communicating through effective data visualizations such as dashboards.
Outside the office, I play the piano, compose music, cook, photograph nature, and try to master new juggling tricks. I really enjoy socializing and discuss broad topics with people.
The aim of this project is to detect diabetes based on medical conditions as features. The dataset contains 8 medical conditions features (X) : Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age. Outcome is a binary target (y) label. I have used 8 different machine learning classifiers to diabetes classfication. Logistic regression and Neural Netowrks seems to provide the best performance based on 10-fold cross validation of the dataset. Logistic regression achieves a higher F1-score as well, which is better metric for model evalution. From the confusion matricies, decision tree has the highest success in detecting the diabetes. Feature selection suggests the Glucose is the most crucial factor for the successful prediction of diabetes.
This project deals with classifying the bioactivity of the drug based on Lipinski molecular descriptors. The target protein is Tyrosine ABL kinase. Mutations in the ABL-kinase are associated with chronic myelogenous leukemia (CML). Data is obtained from the ChEMBL database. The feature engineering (i.e. Lipinski molecular descriptors) was performed f rom the smiles obtained the ChEMBL Database. The project presents an extensive exploratory data analysis (EDA) and general methodology to find active drugs for a target protein .
This project was based on classification of protein function based on their sequences. The protein function which the project focusses is the ATP binding. Data scraping was performed on several protein sequences and their function from biological databases mainly Unitprot. The sequence of the protein were augmented after 500 residues. The sequences, which had lower length were artifically padded with ‘_’. It is a binary classification problem and I used Artifical Neural Network (ANN) to classify the protein function. After the hyperparameter tuning, I obtained an accuracy score of 95% and F1-macro score of 93%.
In this problem, the aim was to identify the number of channels open at each time point with time series electrophysiology data. Ion channels encode learning and memory, help f ight infections, enable pain signals, and stimulate muscle contraction. If scientists could better study ion channels, which may be possible with the aid of machine learning, i t could have a far-reaching impact. My approach was to perform a sequence prediction using the Bidirectional-LSTM model along with undersampling and class reweighing. The model gave an F1-score of 94.0%.
The data is the measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. The aim the project to perform the Timeseries forecasting and predictive analysis on Global_active_power variable, which represents the total power consumption. Project contains :
Ball bearings are a crucial component in any wind turbine. The condition of the ball bearings is monitored to ensure no unexpect ed downtime of the turbine. The ball bearings consists of an outer ring, balls, cage and the inner ring. The ball bearing can be damaged in several ways, w here the most common is a dent in either the inner or the outer ring.
Such a dent will cause distinct failure frequencies to appear as a function of the rotation speed of the shaft inside the ball b earing. The "Ball Pass Frequency Outer"(BPFO) is the frequencies which the balls passes over a single dent in the outer ring, this is typically specified as a multiple of rotation speed by the manufacturer. Every time the ball passes over a dent, it will cause a spike in vibration captured by the data acquisition equipment. This wil l cause harmonics of the fault frequency(BPFO) to appear in the vibration data.
In this project, I tackle the following tasks :