About Me
Hi! Welcome to my Webpage.
Education
MS in Data Informatics • Jan 2019 - Present • California, USA
- Coursework - INF 551: Foundations of Data Management, INF 552: Machine Learning for Data Informatics.
Integrated Master of Technology in Information Technology • August 2013 - July 2018 • Karnataka, India
- Selected Coursework - Introduction to Automata Theory and Computability, Foundations for Big Data Algorithms,
Artificial Intelligence, Machine Learning, Data Analytics, Discrete Mathematics, Linear Algebra, Calculus,
Probability, Statistics.
- In my final semester at IIITB, I worked on a thesis topic in the area of Extreme Classification.
The details can be found here
Experience
Research
Research Assistant • May 2019 - Present • USC Dornsife College of Letters, Arts and Sciences, USA
- Working on two projects, Neural Text Analysis Pipeline (NTAP) and Language Variety (LV). The former is a tool to perform text analysis
while the latter involves understanding the correlation between various cognitive tasks and user media (movies, news, etc.) consumption.
- Worked on resolving bugs in code and improving the software design of the system by using Object Oriented Programming (OOPs)
concepts like Association, Inheritance etc.
- Researched various methods to improve the model training performance by avoiding over fitting. Specifically worked on early
stopping technique, etc.
Research Assistant • May 2019 - Present • USC Keck School of Medicine, USA
- Implemented a deep neural network in Keras, to train a classifier that groups bacteria data into different classes. Researched various
methods to improve the model training performance by avoiding over fitting. Specifically worked on early stopping technique,
dropout etc.
- Employed Deep Clustering methods built using SpectralNet (that performs spectral clustering with deep neural networks) to group
cells together that have similar expression profiles.
- Utilized Deep Feature Selection models to select importance features (genes) at the input level to model non-linearity of features.
Master's thesis • Jan 2018 - June 2018 • IIITB, India
- Worked on an algorithm that can automatically assign an educational video (represented as a document containing video’s subtitle
information) a set of domain tags.
- This problem is an use-case of Extreme Classification. I experimented with CNNs, particularly with XML-CNN to solve this.
Work
Software Developer Intern • May 2016 – July 2016 • Karnataka, India
- WWorked on a project to indicate the quality of air in various countries as expressed by various metrics like AQI(Air Quality Index).
- Implemented the back-end (in PHP) of algorithms to calculate the above mentioned metrics to indicate environment conditions.
Also implemented REST API’s of those algorithms.
Publications
Under Review
Randomness Efficient Feature Hashing for Sparse Binary Data
- Proposed randomness efficient sketching algorithms for sparse binary datasets which
maintain binary version of the dataset after sketching using significantly less
number of random bits for sketching.
Scaling up Simhash
- Proposed a simple, efficient sketching algorithm – Simsketch – that can be applied on
top of the results obtained from Simhash.
Cosine Similarity Preserving Compression for Sparse Binary Data
Projects
Deep Learning
Neural Text Analysis Pipeline (NTAP)
- NTAP is a python package built on top of Tensorflow, pandas, scikit-learn
other libraries to facilitate the core functionalities of text analysis using modern methods from Natural Language Processing.
- Wrote test code, fixed bugs and added features like early stopping to avoid over-fitting and grid search for hyper-parameter tuning.
Predicting next word in a sentence used in text completion using deep learning
- Built a Long Short Term Memory (LSTM) model to predict the next word in a sentence using GloVe embeddings.
Text generation using deep learning -
- Trained a Long Short Term Memory (LSTM) model to mimic Bertrand Russell’s writing style and thoughts using character-level
representation for the model input.
Digit recognition
- Implemented a simple Neural Network to identify digits of MNIST dataset using TensorFlow and Keras.
Machine Learning
- Implemented Linear Regression model in TensorFlow to predict California Housing prices using
Batch Gradient Descent and Mini-Batch Gradient Descent.
Regression
- Compared the accuracies of different models like Linear Regression, Multiple Linear Regression, Polynomial Regression and K-NN Regression
on Combine Cycle Power Plant Dataset. Performed Hypothesis testing to remove insignificant predictors.
- LASSO and Xgboost methods are built on the Communities and Crime dataset and accuracies of both methods are compared.
SMOTE method is applied to compensate class imbalance.
- Built various models like SVM, Logistic Regression, K-Nearest Neighbors, Naive Bayes and a
simple 2-hidden layered Neural Network to classify processed emails as spam or not.
K-NN Classification
- Scatterplots and boxplots helped to gain preliminary insights about the dataset (Vertebral Column Dataset).
Employed K-NN method for classification.
Binary and Multi-Class Classification
- Performance of Logistic Regression and Naive Bayes classifiers are compared on AReM dataset.
Multi-class and Multi-Label classification
- SVM models are used for the classification task on Anuran Calls (MFCCs) dataset. The performance
of these SVM models is then compared with K-means clustering algorithm.
- Analysing speed dating dataset in kaggle to discover dating preferences of either gender.
R was used to find association rules, clustering groups and descriptive analytics.
Drafting an optimal fantasy football team
- Different machine learning models like Naive Bayes, Random Forests and SVMs are applied to draft
the optimal fantasy football team in the English Premier League.
Hackathons
LAHacks 2019
Winner - Honey Best Fairness Hack • March 29 2019 - March 31 2019 • UCLA, USA
- NewsByText is a text SMS based service that lets the user to read summarized news about trending topics without internet.
- Collaborated with 3 other group members by implementing an algorithm that processes, filters and combines the news information obtained from Taboola API.
EurekaHacks 2019
Winner - FabFitFun Challenge • March 24 2019 - March 25 2019 • USC, USA