MACHINE LEARNING PROJECTS
Machine Learning Projects and Their Short Descriptions¶
Full Stack Machine Learning¶
ChatClarity¶
Whatsapp Chat Analyzer and Context Based Searching
Here we have created a web application to analyze Whatsapp group chats. The Web application can generate the following statistics from exported group chat (*.txt file)
- The most used words (in past n number of days).
- Most Active members (in past n number of days).
- Anomalies in message counts.
- Username Based searching.
- Context Based searching.
It is fully capable of handaling lots of text messages in a very less amount of time. Because we've used traditional ML methods to optimize the text file input into a tabular format using RegEx and Pandas. After we've used Spacy for Named Entity Recognition and stopping words deletion. Then using Flask we've taken query from frontend (Made with React by @ToukirAhmedKhan) and by using KNN we've served the context based searched results.
Improvements:
Important messages seperation from group chit chats.
Data Analytics¶
Geospatial Mapping of Mammals' Habitats (UK)¶
Geospatial Analysis of Rare Mammals in UK, their possible range of habitats
Dataset contains mammal sighting in the UK from the NBN Atlas dataset. It includes geo-spacial information on where sightings have occured, as well as biological information on sighted animals in order to filter for specific taxonomies of animals. Data has been modified to removed redundant columns and anonymise the data.
- Taken some functional approach inside notebooks for clean workflow
- Lots of Data Visualization in order to get intricate informations
- State/Province based Mammals' location analysis
- Mapping the whole Uk and pointing out the mammal's habitat (and possible range with circle)
Hotel Price Data Analysis (Bangalore, India) ¶
Analysis of Hotel Price with respect to location, rating, tourism from MakeMyTrip.com (Bangalore)
The dataset is available on kaggle you can take a look at the dataset: 🔗 HERE.
Analyzed this data to get some intricate details
- hotel qualities
- average people choice which kind of hotels
- High rating actually increase number of customers or not
- Average hotel price
- Finding out important landmarks based on the hotel price and no of customers and many more.
Steps:
- Cleaning the data - Renaming some columns, dropping unnecessary columns etc.
- Visualizing the missing values (with
seaborn
andmissingno
library) - Plotting correlation between the data [like Price and Tax has a strong correlation etc.]
- Plotting data individually [like Places vs Price, Reviews vs Price etc.]
- Conclusion about the analysis.
Data Science¶
Diabetes Patient Classification¶
Predicting Diabetes with KNeighborsClassifier
KNeighborsClassifier is a very powerful classification algorithms used to classify with non-linear boundaries. But this model requires some feature engineering: I've Performed some feature engineering to fit the data and get the most out of this model.
Imputed Missing values
Scaled the data to have equal importance while the training
Also after training the model to set the best hyperparameter I've used the most used method → GridSearch
Reverse Image Searching¶
Predicting Diabetes with KNeighborsClassifier
What is Reverse Image Searching?¶
Reverse image searching is a technique for finding images that are similar to a given image. It is useful for finding the original source of an image, finding different versions of an image, or finding information about an image.
Uses cases¶
Reverse image searching can be used for a variety of purposes, such as:
- Finding the original source of an image, such as a photo or painting
- Finding different versions of an image, such as different sizes or crops
- Finding information about an image, such as its subject matter or creator
- Checking if an image is copyrighted
- Finding out if someone is using your images without permission