Building a Fake Internship Detector: My Journey with ML and Streamlit
Published on June 23, 2025 | By [Mohd Shami]
As a budding tech enthusiast, I’ve seen firsthand how internship scams prey on eager job seekers. This inspired me to create the Fake Internship Detector, a web app that uses machine learning (ML) and rule-based logic to identify genuine versus fake internship offers. Launched today, this project is a personal milestone in my learning journey!
The Idea and Challenge
Internship scams often hide behind enticing emails or vague offers, asking for payments or using unprofessional domains like @gmail.com. I wanted to build a tool to empower students and professionals. Starting with a command-line script, I evolved it into an interactive web app using Streamlit, trained on a custom dataset of 100 offers (50 genuine, 50 fake).
Tech Behind the Scenes
The app leverages Python 3.13, with NLTK for NLP (tokenization, stopword removal) and Scikit-learn for a Logistic Regression model with TF-IDF features. Joblib saves the model and vectorizer for efficiency. I added rule-based checks to flag suspicious patterns and integrated smart suggestions, like safety tips for fake offers or career advice for genuine ones.
Development Journey
The process was a rollercoaster! I tackled issues like missing NLTK resources and perfected the dataset using data.py. The shift to Streamlit brought a user-friendly interface with input fields and examples. Achieving 90%+ accuracy on test data was rewarding, though I’m exploring cross-validation to ensure robustness.
What’s Next?
I plan to expand the dataset, deploy the app on Streamlit Community Cloud, and add bulk upload features. Check out the code on GitHub and try it locally with streamlit run app.py.
Call to Action
Feedback or collaboration ideas? Connect with me on LinkedIn (https://www.linkedin.com/in/mohd-shami-792133276/) or drop a comment! Let’s make job hunting safer together. #MachineLearning #NLP #WebDevelopment #TechForGood
Comments
Post a Comment