Python for Data Science (Free 7-Day Mini-Course)

September 29, 2025

16

Python for Data Science (Free 7-Day Mini-Course)

Image by Editor | ChatGPT

# Introduction

Welcome to Python for Data Science, A Free 7-Day Mini Course for beginners! If you’re starting out with data science or want to learn basic Python skills, this beginner-friendly course is for you. Over the next seven days, you’ll learn how to work on data tasks using only core Python.

You’ll learn how to:

Work with fundamental Python data structures
Clean and prepare messy text data
Summarize and group data with dictionaries (just like you do in SQL or Excel)
Write reusable functions that keep your code neat and efficient
Handle errors gracefully so your scripts don’t crash on messy input data
And finally, you’ll build a simple data profiling tool to inspect any CSV dataset

Let’s get started!

Link to the code on GitHub

# Day 1: Variables, Data Types, and File I/O

In data science, everything starts with raw data: survey responses, logs, spreadsheets, forms, scraped websites, etc. Before you can model or analyze anything, you need to:

Load the data
Understand its shape and types
Begin to clean or inspect it

Today, you’ll learn:

The basic Python data types
How to read and write raw .txt files

// 1. Variables

In Python, a variable is a named reference to a value. In data terms, you can think of them as fields, columns, or metadata.

filename = "responses.txt"
survey_name = "Q3 Customer Feedback"
max_entries = 100

// 2. Data Types You’ll Use Often

Don’t worry about obscure types as yet. You’ll mostly use the following:

Python Type	What It’s Used For	Example
str	Raw text, column names	“age”, “unknown”
int	Counts, discrete variables	42, 0, -3
float	Continuous variables	3.14, 0.0, -100.5
bool	Flags / binary outcomes	True, False
None	Missing/null values	None

Knowing when you’re dealing with each — and how to check or convert them — is step zero in data cleaning.

// 3. File Input: Reading Raw Data

Most real-world data lives in .txt, .csv, or .log files. You’ll often need to load them line-by-line, not all at once (especially if large).

Let’s say you have a file called responses.txt:

Here’s how you read it:

with open("responses.txt", "r") as file:
    lines = file.readlines()

for i, line in enumerate(lines):
    cleaned = line.strip()  # removes \n and spaces
    print(f"{i + 1}: {cleaned}")

Output:

1: Yes
2: No
3: Yes
4: Maybe
5: No

// 4. File Output: Writing Processed Data

Let’s say you want to save only “Yes” responses to a new file:

with open("responses.txt", "r") as infile:
    lines = infile.readlines()

yes_responses = []

for line in lines:
    if line.strip().lower() == "yes":
        yes_responses.append(line.strip())

with open("yes_only.txt", "w") as outfile:
    for item in yes_responses:
        outfile.write(item + "\n")

This is a super simple version of a filter-transform-save pipeline, a concept used daily in data preprocessing.

// Exercise: Write Your First Data Script

Create a file called survey.txt and copy in the following lines:

Now write a Python script that:

Reads the file
Counts how many times “yes” appears (case-insensitive). You’ll learn to work with strings later in the text. But do give it a go!
Prints the count
Writes a clean version of the data (capitalized, no whitespace) to cleaned_survey.txt

# Day 2: Basic Python Data Structures

Data science is all about organizing and structuring data so it can be cleaned, analyzed, or modeled. Today you’ll learn the four essential data structures in core Python and how to use them for actual data tasks:

list: for sequences of rows
tuple: for fixed-position records
dict: for labeled data (like columns)
set: for tracking unique values

// 1. List: For Sequences of Data Rows

Lists are the most flexible and common structure, suitable for representing:

A column of values
A collection of records
A dataset with unknown size

Example: Read values from a file into a list.

with open("scores.txt", "r") as file:
    scores = [float(line.strip()) for line in file]

print(scores)

This prints:

You can now:

average = sum(scores) / len(scores)
print(f"Average score: {average:.2f}")

Output:

// 2. Tuple: For Fixed-Structure Records

Tuples are like lists, but immutable and best used for rows with known structure, e.g., (name, age).

Example: Read a file of names and ages.
Suppose we have the following people.txt:

Alice, 34
Bob, 29
Eve, 41

Now let’s read in the contents of the file:

with open("people.txt", "r") as file:
    records = []
    for line in file:
        name, age = line.strip().split(",")
        records.append((name.strip(), int(age.strip())))

Now you can access fields by position:

for person in records:
    name, age = person
    if age > 30:
        print(f"{name} is over 30.")

// 3. Dict: For Labeled Data (Like Columns)

Dictionaries store key-value pairs, the closest thing in core Python to a table row with named columns.

Example: Convert each person record into a dict:

people = []

with open("people.txt", "r") as file:
    for line in file:
        name, age = line.strip().split(",")
        person = {
            "name": name.strip(),
            "age": int(age.strip())
        }
        people.append(person)

Now your data is much more readable and flexible:

for person in people:
    if person["age"] < 60:
        print(f"{person['name']} is perhaps a working professional.")

// 4. Set: For Uniqueness & Fast Membership Checks

Sets automatically remove duplicates. So sets are great for:

Counting unique categories
Checking if a value has been seen before
Tracking distinct values without order

Example: From a file of emails, find all unique domains.

domains = set()

with open("emails.txt", "r") as file:
    for line in file:
        email = line.strip().lower()
        if "@" in email:
            domain = email.split("@")[1]
            domains.add(domain)

print(domains)

Output:

{'gmail.com', 'yahoo.com', 'example.org'}

// Exercise: Code a Mini Data Inspector

Create a file called dataset.txt with the following content:

Now write a Python script that:

Reads each line and stores it as a dictionary with keys: name, age, role
Counts how many people are in each role (use a dictionary) and the number of unique ages (use a set)

# Day 3: Working with Strings

Text strings are everywhere in most real-world datasets — survey responses, user bios, job titles, product reviews, emails, and more — but they’re also inconsistent and unpredictable.

Today, you’ll learn to:

Clean and standardize raw text
Extract information from strings
Build simple text-based features (the kind you can use for filtering or modeling)

// 1. Basic String Cleaning

Let’s say you get this raw list of job titles from a CSV:

titles = [
    "  Data Scientist\n",
    "data scientist",
    "Senior Data Scientist ",
    "DATA scientist",
    "Data engineer",
    "Data Scientist"
]

Your job? Normalize it.

cleaned = [title.strip().lower() for title in titles]

Now everything is lowercase and whitespace-free.

Output:

['data scientist', 'data scientist', 'senior data scientist', 'data scientist', 'data engineer', 'data scientist']

// 2. Standardizing Values

Let’s say you’re only interested in identifying data scientists.

standardized = []

for title in cleaned:
    if "data scientist" in title:
        standardized.append("data scientist")
    else:
        standardized.append(title)

// 3. Counting Words, Checking Patterns

Useful text features:

Number of words
Whether a string contains a keyword
Whether a string is a number or email

Example:

text = " The price is $5,000!  "

# Clean up
clean = text.strip().lower().replace("$", "").replace(",", "").replace("!", "")
print(clean)  

# Word count
word_count = len(clean.split())

# Contains digit
has_number = any(char.isdigit() for char in clean)

print(word_count)
print(has_number)

Output:

"the price is 5000"
4
True

// 4. Splitting and Extracting Parts

Let’s take the email example:

email = "  Alice.Johnson@Example.com  "
email = email.strip().lower()

username, domain = email.split("@")

print(f"User: {username}, Domain: {domain}")

This prints:

User: alice.johnson, Domain: example.com

This kind of extraction is used in user behavior analysis, spam detection, and the like.

// 5. Detecting Specific Text Patterns

You don’t need regular expressions for basic pattern checks.

Example: Check if someone mentioned “python” in a free-text response:

comment = "I'm learning Python and SQL for data jobs."

if "python" in comment.lower():
    print("Mentioned Python")

// Exercise: Clean Survey Comments

Create a file called comments.txt with the following lines:

Great course! Loved the pacing.
Not enough Python examples.
Too basic for experienced users.
python is exactly what I needed!
Would like more SQL content.
Excellent – very beginner-friendly.

Now write a Python script that:

Cleans each comment (strip, lowercase, remove punctuation)
Prints the total number of comments, how many mention “python”, and the average word count per comment

# Day 4: Group, Count, & Summarize with Dictionaries

You’ve used dict to store labeled records. Today, you’ll go a level deeper: using dictionaries to group, count, and summarize data — just like a pivot table or GROUP BY in SQL.

// 1. Grouping by a Field

Let’s say you have this data.

data = [
    {"name": "Alice", "city": "London"},
    {"name": "Bob", "city": "Paris"},
    {"name": "Eve", "city": "London"},
    {"name": "John", "city": "New York"},
    {"name": "Dana", "city": "Paris"},
]

Goal: Count how many people are in each city.

city_counts = {}

for person in data:
    city = person["city"]
    if city not in city_counts:
        city_counts[city] = 1
    else:
        city_counts[city] += 1

print(city_counts)

Output:

{'London': 2, 'Paris': 2, 'New York': 1}

// 2. Summing a Field by Category

Now let’s say we have:

salaries = [
    {"role": "Engineer", "salary": 75000},
    {"role": "Analyst", "salary": 62000},
    {"role": "Engineer", "salary": 80000},
    {"role": "Manager", "salary": 95000},
    {"role": "Analyst", "salary": 64000},
]

Goal: Calculate total and average salary per role.

totals = {}
counts = {}

for person in salaries:
    role = person["role"]
    salary = person["salary"]
    
    totals[role] = totals.get(role, 0) + salary
    counts[role] = counts.get(role, 0) + 1

averages = {role: totals[role] / counts[role] for role in totals}

print(averages)

Output:

{'Engineer': 77500.0, 'Analyst': 63000.0, 'Manager': 95000.0}

// 3. Frequency Table (Mode Detection)

Find the most common age in a dataset:

ages = [29, 34, 29, 41, 34, 29]

freq = {}

for age in ages:
    freq[age] = freq.get(age, 0) + 1

most_common = max(freq.items(), key=lambda x: x[1])

print(f"Most common age: {most_common[0]} (appears {most_common[1]} times)")

Output:

Most common age: 29 (appears 3 times)

// Exercise: Analyze Employee Dataset

Create a file employees.txt with the following content:

Alice,London,Engineer,75000
Bob,Paris,Analyst,62000
Eve,London,Engineer,80000
John,New York,Manager,95000
Dana,Paris,Analyst,64000

Write a Python script that:

Loads the data into a list of dictionaries
Prints the number of employees per city and the average salary per role

# Day 5: Writing Functions

You’ve written code that loads, cleans, filters, and summarizes data. Now you’ll package that logic into functions, so you can:

Reuse your code
Build processing pipelines
Keep scripts readable and testable

// 1. Cleaning Text Inputs

Let’s write a function to perform basic text cleaning:

def clean_text(text):
    return text.strip().lower().replace(",", "").replace("$", "")

Now you can apply this to every field you read from a file.

// 2. Creating Row Records

Next, here’s a simple function to parse each row in a file and create record:

def parse_row(line):
    parts = line.strip().split(",")
    return {
        "name": parts[0],
        "city": parts[1],
        "role": parts[2],
        "salary": int(parts[3])
    }

Now your file loading becomes:

with open("employees.txt") as file:
    rows = [parse_row(line) for line in file]

// 3. Aggregation Helpers

So far, you’ve computed averages and count of occurrences. Let’s write some basic helper functions for the same:

def average(values):
    return sum(values) / len(values) if values else 0

def count_by_key(data, key):
    counts = {}
    for item in data:
        k = item[key]
        counts[k] = counts.get(k, 0) + 1
    return counts

// Exercise: Modularize Previous Work

Refactor yesterday’s solution into reusable functions:

load_data(filename)
average_salary_by_role(data)
count_by_city(data)

Then use them in a script that prints the same output as Day 4.

# Day 6: Reading, Writing, and Basic Error-Handling

Data files are often incomplete, corrupted, and misformatted. So how do you deal with them?

Today you’ll learn:

How to read and write structured files
How to gracefully handle errors
How to skip or log bad rows without crashing

// 1. Safer File Reading

What happens when you try reading a file that doesn’t exist? Here’s how you “try” opening the file and catch “FileNotFoundError” if the file doesn’t exist.

try:
    with open("employees.txt") as file:
        lines = file.readlines()
except FileNotFoundError:
    print("Error: File not found.")
    lines = []

// 2. Handling Bad Rows Gracefully

Now let’s try to skip bad rows and process only the complete rows.

records = []

for line in lines:
    try:
        parts = line.strip().split(",")
        if len(parts) != 4:
            raise ValueError("Incorrect number of fields")
        record = {
            "name": parts[0],
            "city": parts[1],
            "role": parts[2],
            "salary": int(parts[3])
        }
        records.append(record)
    except Exception as e:
        print(f"Skipping bad line: {line.strip()} ({e})")

// 3. Writing Cleaned Data to a File

Finally, let’s write the cleaned data to a file.

with open("cleaned_employees.txt", "w") as out:
    for r in records:
        out.write(f"{r['name']},{r['city']},{r['role']},{r['salary']}\n")

// Exercise: Make a Fault-Tolerant Loader

Create a file raw_employees.txt with a couple of incomplete or messy lines like:

Alice,London,Engineer,75000
Bob,Paris,Analyst
Eve,London,Engineer,eighty thousand
John,New York,Manager,95000

Write a script that:

Loads only valid records
Prints number of valid rows
Writes them to validated_employees.txt

# Day 7: Build a Mini Data Profiler (Project Day)

Great work on making it so far. Today, you’ll create a standalone Python script that:

Loads a CSV file
Detects column names and types
Computes useful stats
Writes a summary report

// Step-by-Step Outline

1. Load the file:

def load_csv(filename):
    with open(filename) as f:
        lines = [line.strip() for line in f if line.strip()]
    header = lines[0].split(",")
    rows = [line.split(",") for line in lines[1:]]
    return header, rows

2. Detect column types:

def detect_type(value):
    try:
        float(value)
        return "numeric"
    except:
        return "text"

3. Profile each column:

def profile_columns(header, rows):
    summary = {}
    for i, col in enumerate(header):
        values = [row[i].strip() for row in rows if len(row) == len(header)]
        col_type = detect_type(values[0])
        unique = set(values)
        summary[col] = {
            "type": col_type,
            "unique_count": len(unique),
            "most_common": max(set(values), key=values.count)
 }
 if col_type == "numeric":
 nums = [float(v) for v in values if v.replace('.', '', 1).isdigit()]
 summary[col]["average"] = sum(nums) / len(nums) if nums else 0
 return summary

4. Create a summary:

def write_summary(summary, out_file):
    with open(out_file, "w") as f:
        for col, stats in summary.items():
            f.write(f"Column: {col}\n")
            for k, v in stats.items():
                f.write(f"  {k}: {v}\n")
            f.write("\n")

You can use the functions like so:

header, rows = load_csv("employees.csv")
summary = profile_columns(header, rows)
write_summary(summary, "profile_report.txt")

// Final Exercise

Use your own CSV file (or reuse earlier ones). Run the profiler and check the output.

# Conclusion

Congratulations! You’ve completed the Python for Data Science mini-course.

Over this week, you’ve moved from basic Python data structures to writing modular functions and scripts that handle real data problems. These are the basics, and by that I mean, really basic stuff. I suggest you use this as a starting point and learn more about Python’s standard library (by doing of course).

Thank you for learning with me. Happy coding and data crunching ahead!

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

Source link

Python for Data Science (Free 7-Day Mini-Course)

# Introduction

# Day 1: Variables, Data Types, and File I/O

// 1. Variables

// 2. Data Types You’ll Use Often

// 3. File Input: Reading Raw Data

// 4. File Output: Writing Processed Data

// Exercise: Write Your First Data Script

# Day 2: Basic Python Data Structures

// 1. List: For Sequences of Data Rows

// 2. Tuple: For Fixed-Structure Records

// 3. Dict: For Labeled Data (Like Columns)

// 4. Set: For Uniqueness & Fast Membership Checks

// Exercise: Code a Mini Data Inspector

# Day 3: Working with Strings

// 1. Basic String Cleaning

// 2. Standardizing Values

// 3. Counting Words, Checking Patterns

// 4. Splitting and Extracting Parts

// 5. Detecting Specific Text Patterns

// Exercise: Clean Survey Comments

# Day 4: Group, Count, & Summarize with Dictionaries

// 1. Grouping by a Field

// 2. Summing a Field by Category

// 3. Frequency Table (Mode Detection)

// Exercise: Analyze Employee Dataset

# Day 5: Writing Functions

// 1. Cleaning Text Inputs

// 2. Creating Row Records

// 3. Aggregation Helpers

// Exercise: Modularize Previous Work

# Day 6: Reading, Writing, and Basic Error-Handling

// 1. Safer File Reading

// 2. Handling Bad Rows Gracefully

// 3. Writing Cleaned Data to a File

// Exercise: Make a Fault-Tolerant Loader

# Day 7: Build a Mini Data Profiler (Project Day)

// Step-by-Step Outline

// Final Exercise

# Conclusion

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US