5 Routine Tasks That ChatGPT Can Handle for Data Scientists

August 4, 2025

3

Tasks That ChatGPT Can Handle for Data Scientists

Image by Author | Canva

According to the data science report by Anaconda, data scientists spend nearly 60% of their time on cleaning and organizing data. These are routine, time-consuming tasks that make them ideal candidates for ChatGPT to take over.

In this article, we will explore five routine tasks that ChatGPT can handle if you use the right prompts, including cleaning and organizing the data. We’ll use a real data project from Gett, a London black taxi app similar to Uber, used in their recruitment process, to show how it works in practice.

Case Study: Analyzing Failed Ride Orders from Gett

In this data project, Gett asks you to analyze failed rider orders by examining key matching metrics to understand why some customers did not successfully get a car.

Here is the data description.

Now, let’s explore it by uploading the data to ChatGPT.

In the next five steps, we will walk through the routine tasks that ChatGPT can handle in a data project. The steps are shown below.

Step 1: Data Exploration and Analysis

In data exploration, we use the same functions every time, like head, info, or describe.

When we ask ChatGPT, we’ll include the key functions in the prompt. We’ll also paste the project description and attach the dataset.

We will use the prompt below. Just replace the text inside the square brackets with the project description. You can find the project description here:

Here is the data project description: [paste here ] 
Perform basic EDA, show head, info, and summary stats, missing values, and correlation heatmap.

Here is the output.

As you can see, ChatGPT summarizes the dataset by highlighting key columns, missing values, and then creates a correlation heatmap to explore relationships.

Step 2: Data Cleaning

Both datasets contain missing values.

Let’s write a prompt to work on this.

Clean this dataset: identify and handle missing values appropriately (e.g., drop or impute based on context). Provide a summary of the cleaning steps.

Here is the summary of what ChatGPT did:

ChatGPT converted the date column, dropped invalid orders, and imputed missing values to the m_order_eta.

Step 3: Generate Visualizations

To make the most of your data, it is important to visualize the right things. Instead of generating random plots, we can guide ChatGPT by providing the link to the source, which is called Retrieval-Augmented Generation.

We will use this article. Here is the prompt:

Before generating visualizations, read this article on choosing the right plots for different data types and distributions: [LINK]. hen, show most suitable visualizations for this dataset and explain why each was selected and produce the plots in this chat by running code on the dataset.

Here is the output.

We have six different graphs that we produced with ChatGPT.

You will see why the related graph has been selected, the graph, and the explanation of this graph.

Step 4: Make your Dataset Ready for Machine Learning

Now that we have handled missing values and explored the dataset, the next step is to prepare it for machine learning. This involves steps like encoding categorical variables and scaling numerical features.

Here is our prompt.

Prepare this dataset for machine learning: encode categorical variables, scale numerical features, and return a clean DataFrame ready for modeling. Briefly explain each step.

Here is the output.

Now your features have been scaled and encoded, so your dataset is ready to apply a machine learning model.

Step 5: Applying Machine Learning Model

Let’s move on to machine learning modeling. We will use the following prompt structure to apply a basic machine learning model.

Use this dataset to predict [target variable]. Apply [model type] and report machine learning evaluation metrics like [accuracy, precision, recall, F1-score]. Use only relevant 5 features and explain your modeling steps.

Let’s update this prompt based on our project.

Use this dataset to predict order_status_key. Apply a multiclass classification model (e.g., Random Forest), and report evaluation metrics like accuracy, precision, recall, and F1-score. Use only the 5 most relevant features and explain your modeling steps.

Now, paste this into the ongoing conversation and review the output.

Here is the output.

As you can see, the model performed well, perhaps too well?

Bonus: Gemini CLI

Gemini has launched an open-source agent that you can interact with from your terminal. You can install it by using this code. (60 model requests per minute and 1,000 requests per day at no charge.)

Besides ChatGPT, you can also use Gemini CLI to handle routine data science tasks, such as cleaning, exploration, and even building a dashboard to automate these tasks.

The Gemini CLI provides a straightforward command-line interface and is available at no cost. Let’s start by installing it using the code below.

sudo npm install -g @google/gemini-cli

After running the code above, open your terminal and paste the following code to start building with it:

Once you run the commands above, you’ll see the Gemini CLI as shown in the screenshot below.

Gemini CLI lets you run code, ask questions, or even build apps directly from your terminal. In this case, we will use Gemini CLI to build a Streamlit app that automates everything we’ve done so far, EDA, cleaning, visualization, and modeling.

To build a Streamlit app, we will use a prompt that covers all steps. It’s shown below.

Built a streamlit app that automates EDA, Data Cleaning, Creates Automatic data visualization, prepares the dataset for machine learning, and applies a machine learning model after selecting target variables by the user.

Step 1 – Basic EDA:
• Display .head(), .info(), and .describe()
• Show missing values per column
• Show correlation heatmap of numerical features
Step 2 – Data Cleaning:
• Detect columns with missing values
• Handle missing data appropriately (drop or impute)
• Display a summary of cleaning actions taken
Step 3 – Auto Visualizations
• Before plotting, use these visualization principles:
• Use histograms for numerical distributions
• Use bar plots for categorical distributions
• Use boxplots or violin plots to compare categories
• Use scatter plots for numerical relationships
• Use correlation heatmaps for multicollinearity
• Use line plots for time series (if applicable)
• Generate the most relevant plots for this dataset
• Explain why each plot was chosen
Step 4 – Machine Learning Preparation:
• Encode variables
• Scale numerical features
• Return a clean DataFrame ready for modeling
Step 5 – Apply Machine Learning Model:
• Offer the target variable to the user.
• Apply multiple machine learning models.
• Report evaluation metrics.
Each step should display in a different tab. Run the Streamlit app after you built it.

It will prompt you for permission when creating the directory or running code on your terminal.

After a few approval steps like we did, the Streamlit app will be ready, as shown below.

Now, let’s test it.

Final Thoughts

In this article, we first used ChatGPT to handle routine tasks, such as data cleaning, exploration, and data visualization. Next, we went one step further by using it to prepare our dataset for machine learning and applied machine learning models.

Finally, we used Gemini CLI to create a Streamlit dashboard that performs all of these steps with just a click.

To demonstrate all of this, we have used a data project from Gett. Although AI is not yet entirely reliable for every task, you can leverage it to handle routine tasks, saving you a lot of time.

Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

Source link

5 Routine Tasks That ChatGPT Can Handle for Data Scientists

Case Study: Analyzing Failed Ride Orders from Gett

Step 1: Data Exploration and Analysis

Step 2: Data Cleaning

Step 3: Generate Visualizations

Step 4: Make your Dataset Ready for Machine Learning

Step 5: Applying Machine Learning Model

Bonus: Gemini CLI

Final Thoughts

Harvey Reaches $100m ARR + 42% of AmLaw 100 – Artificial Lawyer

I Tested Candy AI Unfiltered Chat for 1 Month

10 Python Libraries Every MLOps Engineer Should Know

LEAVE A REPLY Cancel reply

Most Popular

Ekta Kapoor UNHAPPY with a popular star’s performance? Did that lead to a heated argument on sets?

U.S. border agents directed to stop deportations under Trump’s asylum ban after court order, sources say

10 Wonderful Emotional Benefits of Cycling

Dominik Szoboszlai “relentless” as fans say you “can’t take him out of this side” – Liverpool FC

Recent Comments

EDITOR PICKS

Ekta Kapoor UNHAPPY with a popular star’s performance? Did that lead to a heated argument on sets?

U.S. border agents directed to stop deportations under Trump’s asylum ban after court order, sources say

10 Wonderful Emotional Benefits of Cycling

POPULAR POSTS

Ekta Kapoor UNHAPPY with a popular star’s performance? Did that lead to a heated argument on sets?

U.S. border agents directed to stop deportations under Trump’s asylum ban after court order, sources say

10 Wonderful Emotional Benefits of Cycling

POPULAR CATEGORY

ABOUT US

FOLLOW US