Step-by-step tutorial to create some of the most in-demand projects like computer vision, NLP, LLMs, and more. Impress recruiters with these data science projects.
Blog
When it comes to data science roles, recruiters are highly interested in seeing practical projects that show your coding abilities and understanding of key concepts.
Simply listing courses or certifications is not enough - they want tangible examples of you building and deploying models or applications.
The ideal projects should: π
β Involve end-to-end implementation from data collection to model deployment
β Solve a real-world problem or have an innovative/creative use case
β Utilise popular Python libraries, frameworks and modern techniques
β Be deployable and presentable as a product demo if possible
In this article, we'll discuss some compelling Python project ideas that can make your data science resume stand out. β
From working with large language models to building interactive apps - these hands-on projects cover diverse areas that are in high demand.
We'll go through step-by-step on how to build each one, along with tips on making them portfolio-worthy.
Whether you're a student or an experienced professional, executing these projects will amp up your practical experience dramatically. π
Let's dive in!
First things first, what kind of projects to choose?
Do you go for the end-to-end one, where you tackle a problem from start to finish, or do you select super niche projects that demand a ton of expertise? π€
Let's break it down.
π End-to-end projects are your all-in-one package. You start with a problem, crack it wide open, train a model using the data, and then deploy it. It's a neat journey from identifying the problem to presenting the solution.
πOn the other side, there are these super niche projects. They require niche expertise, intricate details, and a considerable amount of time. But here's the kicker: they might take forever to deploy.
That's the trade-off.
Take Netflix, for example.
They ran a competition to improve their recommendation system, giving $1 million for the top spot. Well, turns out, they never used the winning model.
Why? Because the efforts and costs to deploy the model were too much.
Moral of the story: Opt for projects that are simple and deployable. β
Google has created a platform that allows you to train models on your own unstructured data easily.
Here are the steps to get started: π
Once training is completed, you can test your model by providing new image inputs. It will predict if you are present (class 1) or not (class 2) with a confidence score.
This exercise shows how effortless it is to build and deploy custom machine learning models on Google's platform using your data.
This is a simple model and can be an inspiration for you to create something similar.
If you want to get into Natural Language Processing (NLP) or already know some kind of NLP language and want to do a basic project, the Quora question pair dataset by Kaggle is a great resource.
This dataset contains pairs of questions taken from the Q&A website Quora, and your goal is to detect whether these two questions are semantically similar or not.
This similarity detection is useful for Quora's platform to identify duplicate questions and guide users to existing answers instead of posting redundant ones.
Working with this dataset allows you to learn the fundamentals of NLP by building models that can understand the semantics of text and determine if two pieces of text convey the same meaning or not. β
Recently, I built an object detection app as a computer vision project. I used the Streamlit framework to create a web application interface.
The app allows users to upload an image, and it detects and identifies vehicles like buses or cars present in that image by drawing bounding boxes around them.
This project shows how to build an end-to-end computer vision system that can be deployed as a practical application.
The core components involve using deep learning models for object detection, integrating them into a user-friendly web app, and providing a seamless experience for users to analyse their images.
You can watch the full tutorial here: π
Large Language Models or LLMs are the latest trend in the field of natural language processing.
If you want to gain experience with LLMs, a great project idea is to take a dataset from the Hugging Face platform, choose a pre-trained LLM like LLaMa or Falcon (which are open-source), and fine-tune it on your chosen dataset.
The fine-tuning process involves taking the generalised knowledge of the pre-trained LLM and specialising it for a specific task using your dataset.
You can also use GPT-3 or the latest GPT-4 from OpenAI if you have access to their APIs.
Fine-tuning these powerful models allows you to create customised language models tailored to your use case, like question-answering, text summarisation, or any other language task. β
Having a project showcasing your LLM fine-tuning skills is an impressive addition to your resume.
This project is about creating a virtual keyboard that you can control using your hands and gestures, instead of a physical keyboard.
βΆοΈ To do this, set up a camera pointing down at a flat surface, like a table or desk.
βΆοΈ Using a Python library called OpenCV, process the video feed from the camera. This involves removing the background, applying filters, and detecting the movement of your hands or fingers in the video.
βΆοΈ Next, divide the flat surface into different regions, where each region represents a key on a standard keyboard layout. For example, one region for the 'A' key, another for 'B', and so on.
βΆοΈ Whenever your hand or finger enters one of these key regions, the program will detect it and trigger that specific key press, just like you pressed it on a physical keyboard.
βΆοΈ To make this trigger the actual key presses, connect your Python program with another library that can control keyboard inputs on your computer.
This way, you can use the virtual keyboard with any application on your computer, just by moving your hands over the tracked surface.
It can be really useful for people with disabilities or as a cool, innovative input method.
Additionally, you can make the virtual keyboard even smarter by training it to recognise multiple hands or objects. This would allow for more advanced gesture recognition and controls.
The key parts are using computer vision to track hand movements, mapping those movements to key regions, and then triggering keyboard inputs based on the regions - all done through coding in Python! β
You can watch the full process in this video:
To summarise, the core things that will make your data science projects shine are: π
β‘ Solve a simple but practical problem end-to-end - From data collection and preprocessing to model training, evaluation and deployment. Don't just stop at the model.
β‘ Create an API endpoint to expose your model's predictions and make it easily accessible and usable.
β‘ Build an interactive Streamlit app as a front-end interface for your project. It makes the whole experience user-friendly.
β‘ Visualise your model's outputs by connecting it to a dashboard. This allows anyone to see and interpret the predictions in a visual format.
The combination of coding skills, creating usable products, and effective presentations - that's what will make recruiters stop and take notice of your projects.
If you need help with any part of the project development process or need guidance on making your portfolio, feel free to book a 1:1 call with me.
I'm happy to review your work, provide feedback, and share tips based on what tech companies are looking for. β
My goal is to help you land your dream data science role by perfecting your practical skills showcased through awesome Python projects!
So what are you waiting for? Get started on one of these project ideas today and take that first step towards an impressive data science resume. πͺ
You got this!
Copyright Β©2024 Preplaced.in
Preplaced Education Private Limited
Ibblur Village, Bangalore - 560103
GSTIN- 29AAKCP9555E1ZV