5 Best Python Projects for Your Data Science Resume

Step-by-step tutorial to create some of the most in-demand projects like computer vision, NLP, LLMs, and more. Impress recruiters with these data science projects.

Mentor

Blog

When it comes to data science roles, recruiters are highly interested in seeing practical projects that show your coding abilities and understanding of key concepts.

Simply listing courses or certifications is not enough - they want tangible examples of you building and deploying models or applications.

The ideal projects should: πŸ‘‡

βœ… Involve end-to-end implementation from data collection to model deployment

βœ… Solve a real-world problem or have an innovative/creative use case

βœ… Utilise popular Python libraries, frameworks and modern techniques

βœ… Be deployable and presentable as a product demo if possible

In this article, we'll discuss some compelling Python project ideas that can make your data science resume stand out. βœ…

From working with large language models to building interactive apps - these hands-on projects cover diverse areas that are in high demand.

We'll go through step-by-step on how to build each one, along with tips on making them portfolio-worthy.

Whether you're a student or an experienced professional, executing these projects will amp up your practical experience dramatically. πŸš€

Let's dive in!

What Kind of Projects to Choose?

First things first, what kind of projects to choose?

Do you go for the end-to-end one, where you tackle a problem from start to finish, or do you select super niche projects that demand a ton of expertise? πŸ€”

Let's break it down.

πŸ‘‰ End-to-end projects are your all-in-one package. You start with a problem, crack it wide open, train a model using the data, and then deploy it. It's a neat journey from identifying the problem to presenting the solution.

πŸ‘‰On the other side, there are these super niche projects. They require niche expertise, intricate details, and a considerable amount of time. But here's the kicker: they might take forever to deploy.

That's the trade-off.

Take Netflix, for example.

Netflix 

They ran a competition to improve their recommendation system, giving $1 million for the top spot. Well, turns out, they never used the winning model.

Why? Because the efforts and costs to deploy the model were too much.

Moral of the story: Opt for projects that are simple and deployable. βœ”

Project Examples (What to Build)

πŸ“Œ Train on unstructured data online

Teachable Machine

Google has created a platform that allows you to train models on your own unstructured data easily.

Here are the steps to get started: πŸ‘‡

  1. Go to the platform and click "New Project" on the top left bar. You'll see three options:
    1. Let's choose the Image Project option. This allows you to train an image classification model on your data inputs.
      1. You'll need to manually upload your image data. The data size is usually small for these kinds of projects.
        1. Once uploaded, the platform uses a pre-trained model as a starting point and fine-tunes it on your data to create a custom model.
          1. For example, let's train a model to detect if you are present in an image or not. Create two classes - one for images with you, and another for images without you.
            1. Record video footage of yourself moving around to capture images with variation for the first class. Then record images without you in the frame for the second class.
              1. 7. After adding data for both classes, click "Train Model". You can adjust hyperparameters like epochs (the more the better), batch size, and learning rate as needed.

                Once training is completed, you can test your model by providing new image inputs. It will predict if you are present (class 1) or not (class 2) with a confidence score.

                This exercise shows how effortless it is to build and deploy custom machine learning models on Google's platform using your data.

                This is a simple model and can be an inspiration for you to create something similar.

                πŸ“Œ Quora Question Pairs for NLP

                Quora question pair

                If you want to get into Natural Language Processing (NLP) or already know some kind of NLP language and want to do a basic project, the Quora question pair dataset by Kaggle is a great resource.

                This dataset contains pairs of questions taken from the Q&A website Quora, and your goal is to detect whether these two questions are semantically similar or not.

                This similarity detection is useful for Quora's platform to identify duplicate questions and guide users to existing answers instead of posting redundant ones.

                Working with this dataset allows you to learn the fundamentals of NLP by building models that can understand the semantics of text and determine if two pieces of text convey the same meaning or not. βœ”

                πŸ“Œ Computer Vision - Object Detection App

                Recently, I built an object detection app as a computer vision project. I used the Streamlit framework to create a web application interface.

                The app allows users to upload an image, and it detects and identifies vehicles like buses or cars present in that image by drawing bounding boxes around them.

                Object detection app

                This project shows how to build an end-to-end computer vision system that can be deployed as a practical application.

                The core components involve using deep learning models for object detection, integrating them into a user-friendly web app, and providing a seamless experience for users to analyse their images.

                You can watch the full tutorial here: πŸ‘‡

                πŸ“Œ Fine-tuning Large Language Models (LLMs)

                Large Language Models or LLMs are the latest trend in the field of natural language processing.

                If you want to gain experience with LLMs, a great project idea is to take a dataset from the Hugging Face platform, choose a pre-trained LLM like LLaMa or Falcon (which are open-source), and fine-tune it on your chosen dataset.

                Hugging Face platform

                The fine-tuning process involves taking the generalised knowledge of the pre-trained LLM and specialising it for a specific task using your dataset.

                You can also use GPT-3 or the latest GPT-4 from OpenAI if you have access to their APIs.

                Fine-tuning these powerful models allows you to create customised language models tailored to your use case, like question-answering, text summarisation, or any other language task. βœ”

                Having a project showcasing your LLM fine-tuning skills is an impressive addition to your resume.

                πŸ“Œ Virtual Keyboard Using Python

                This project is about creating a virtual keyboard that you can control using your hands and gestures, instead of a physical keyboard.

                virtual keyboard

                ▢️ To do this, set up a camera pointing down at a flat surface, like a table or desk.

                ▢️ Using a Python library called OpenCV, process the video feed from the camera. This involves removing the background, applying filters, and detecting the movement of your hands or fingers in the video.

                ▢️ Next, divide the flat surface into different regions, where each region represents a key on a standard keyboard layout. For example, one region for the 'A' key, another for 'B', and so on.

                ▢️ Whenever your hand or finger enters one of these key regions, the program will detect it and trigger that specific key press, just like you pressed it on a physical keyboard.

                ▢️ To make this trigger the actual key presses, connect your Python program with another library that can control keyboard inputs on your computer.

                This way, you can use the virtual keyboard with any application on your computer, just by moving your hands over the tracked surface.

                It can be really useful for people with disabilities or as a cool, innovative input method.

                Additionally, you can make the virtual keyboard even smarter by training it to recognise multiple hands or objects. This would allow for more advanced gesture recognition and controls.

                The key parts are using computer vision to track hand movements, mapping those movements to key regions, and then triggering keyboard inputs based on the regions - all done through coding in Python! βœ”

                You can watch the full process in this video: 

                Key Takeaways

                To summarise, the core things that will make your data science projects shine are: πŸ‘‡

                ⚑ Solve a simple but practical problem end-to-end - From data collection and preprocessing to model training, evaluation and deployment. Don't just stop at the model.

                ⚑ Create an API endpoint to expose your model's predictions and make it easily accessible and usable.

                ⚑ Build an interactive Streamlit app as a front-end interface for your project. It makes the whole experience user-friendly.

                ⚑ Visualise your model's outputs by connecting it to a dashboard. This allows anyone to see and interpret the predictions in a visual format.

                The combination of coding skills, creating usable products, and effective presentations - that's what will make recruiters stop and take notice of your projects.

                If you need help with any part of the project development process or need guidance on making your portfolio, feel free to book a 1:1 call with me.

                I'm happy to review your work, provide feedback, and share tips based on what tech companies are looking for. βœ…

                My goal is to help you land your dream data science role by perfecting your practical skills showcased through awesome Python projects!

                So what are you waiting for? Get started on one of these project ideas today and take that first step towards an impressive data science resume. πŸ’ͺ

                You got this!