Neszed-Mobile-header-logo
Monday, December 15, 2025
Newszed-Header-Logo
HomeAIPython for Data Science (Free 7-Day Mini-Course)

Python for Data Science (Free 7-Day Mini-Course)

Python for Data Science (Free 7-Day Mini-Course)
Image by Editor | ChatGPT

 

Introduction

 
Welcome to Python for Data Science, A Free 7-Day Mini Course for beginners! If you’re starting out with data science or want to learn basic Python skills, this beginner-friendly course is for you. Over the next seven days, you’ll learn how to work on data tasks using only core Python.

You’ll learn how to:

  • Work with fundamental Python data structures
  • Clean and prepare messy text data
  • Summarize and group data with dictionaries  (just like you do in SQL or Excel)
  • Write reusable functions that keep your code neat and efficient
  • Handle errors gracefully so your scripts don’t crash on messy input data
  • And finally, you’ll build a simple data profiling tool to inspect any CSV dataset

Let’s get started!

Link to the code on GitHub

 

Day 1: Variables, Data Types, and File I/O

 
In data science, everything starts with raw data: survey responses, logs, spreadsheets, forms, scraped websites, etc. Before you can model or analyze anything, you need to:

  • Load the data
  • Understand its shape and types
  • Begin to clean or inspect it

Today, you’ll learn:

  • The basic Python data types
  • How to read and write raw .txt files

 

// 1. Variables

In Python, a variable is a named reference to a value. In data terms, you can think of them as fields, columns, or metadata.

filename = "responses.txt"
survey_name = "Q3 Customer Feedback"
max_entries = 100

 

// 2. Data Types You’ll Use Often

Don’t worry about obscure types as yet. You’ll mostly use the following:

Python Type What It’s Used For Example
str Raw text, column names “age”, “unknown”
int Counts, discrete variables 42, 0, -3
float Continuous variables 3.14, 0.0, -100.5
bool Flags / binary outcomes True, False
None Missing/null values None

Knowing when you’re dealing with each — and how to check or convert them — is step zero in data cleaning.
 

// 3. File Input: Reading Raw Data

Most real-world data lives in .txt, .csv, or .log files. You’ll often need to load them line-by-line, not all at once (especially if large).

Let’s say you have a file called responses.txt:

 

Here’s how you read it:

with open("responses.txt", "r") as file:
    lines = file.readlines()

for i, line in enumerate(lines):
    cleaned = line.strip()  # removes \n and spaces
    print(f"{i + 1}: {cleaned}")

 

Output:

1: Yes
2: No
3: Yes
4: Maybe
5: No

 

// 4. File Output: Writing Processed Data

Let’s say you want to save only “Yes” responses to a new file:

with open("responses.txt", "r") as infile:
    lines = infile.readlines()

yes_responses = []

for line in lines:
    if line.strip().lower() == "yes":
        yes_responses.append(line.strip())

with open("yes_only.txt", "w") as outfile:
    for item in yes_responses:
        outfile.write(item + "\n")

 

This is a super simple version of a filter-transform-save pipeline, a concept used daily in data preprocessing.
 

//  Exercise: Write Your First Data Script

Create a file called survey.txt and copy in the following lines:

 
Now write a Python script that:

  1. Reads the file
  2. Counts how many times “yes” appears (case-insensitive). You’ll learn to work with strings later in the text. But do give it a go!
  3. Prints the count
  4. Writes a clean version of the data (capitalized, no whitespace) to cleaned_survey.txt

 

Day 2: Basic Python Data Structures

 
Data science is all about organizing and structuring data so it can be cleaned, analyzed, or modeled. Today you’ll learn the four essential data structures in core Python and how to use them for actual data tasks:

  • list: for sequences of rows
  • tuple: for fixed-position records
  • dict: for labeled data (like columns)
  • set: for tracking unique values

 

// 1. List: For Sequences of Data Rows

Lists are the most flexible and common structure, suitable for representing:

  • A column of values
  • A collection of records
  • A dataset with unknown size

Example: Read values from a file into a list.

with open("scores.txt", "r") as file:
    scores = [float(line.strip()) for line in file]

print(scores)

 
This prints:

 

You can now:

average = sum(scores) / len(scores)
print(f"Average score: {average:.2f}")

 
Output:

 

// 2. Tuple: For Fixed-Structure Records

Tuples are like lists, but immutable and best used for rows with known structure, e.g., (name, age).

Example: Read a file of names and ages.
Suppose we have the following people.txt:

Alice, 34
Bob, 29
Eve, 41

 
Now let’s read in the contents of the file:

with open("people.txt", "r") as file:
    records = []
    for line in file:
        name, age = line.strip().split(",")
        records.append((name.strip(), int(age.strip())))

 
Now you can access fields by position:

for person in records:
    name, age = person
    if age > 30:
        print(f"{name} is over 30.")

 

// 3. Dict: For Labeled Data (Like Columns)

Dictionaries store key-value pairs, the closest thing in core Python to a table row with named columns.

Example: Convert each person record into a dict:

people = []

with open("people.txt", "r") as file:
    for line in file:
        name, age = line.strip().split(",")
        person = {
            "name": name.strip(),
            "age": int(age.strip())
        }
        people.append(person)

 

Now your data is much more readable and flexible:

for person in people:
    if person["age"] < 60:
        print(f"{person['name']} is perhaps a working professional.")

 

// 4. Set: For Uniqueness & Fast Membership Checks

Sets automatically remove duplicates. So sets are great for:

  • Counting unique categories
  • Checking if a value has been seen before
  • Tracking distinct values without order

Example: From a file of emails, find all unique domains.

domains = set()

with open("emails.txt", "r") as file:
    for line in file:
        email = line.strip().lower()
        if "@" in email:
            domain = email.split("@")[1]
            domains.add(domain)

print(domains) 

 
Output:

{'gmail.com', 'yahoo.com', 'example.org'}

 

//  Exercise: Code a Mini Data Inspector

Create a file called dataset.txt with the following content:

Now write a Python script that:

  1. Reads each line and stores it as a dictionary with keys: name, age, role
  2. Counts how many people are in each role (use a dictionary) and the number of unique ages (use a set)

 

Day 3: Working with Strings

 
Text strings are everywhere in most real-world datasets — survey responses, user bios, job titles, product reviews, emails, and more — but they’re also inconsistent and unpredictable.

Today, you’ll learn to:

  • Clean and standardize raw text
  • Extract information from strings
  • Build simple text-based features (the kind you can use for filtering or modeling)

// 1. Basic String Cleaning

Let’s say you get this raw list of job titles from a CSV:

titles = [
    "  Data Scientist\n",
    "data scientist",
    "Senior Data Scientist ",
    "DATA scientist",
    "Data engineer",
    "Data Scientist"
]

 

Your job? Normalize it.

cleaned = [title.strip().lower() for title in titles]

 

Now everything is lowercase and whitespace-free.

 
Output:

['data scientist', 'data scientist', 'senior data scientist', 'data scientist', 'data engineer', 'data scientist']

 

// 2. Standardizing Values

Let’s say you’re only interested in identifying data scientists.

standardized = []

for title in cleaned:
    if "data scientist" in title:
        standardized.append("data scientist")
    else:
        standardized.append(title)

 

// 3. Counting Words, Checking Patterns

Useful text features:

  • Number of words
  • Whether a string contains a keyword
  • Whether a string is a number or email

Example:

text = " The price is $5,000!  "

# Clean up
clean = text.strip().lower().replace("$", "").replace(",", "").replace("!", "")
print(clean)  

# Word count
word_count = len(clean.split())

# Contains digit
has_number = any(char.isdigit() for char in clean)

print(word_count)
print(has_number)

 
Output:

"the price is 5000"
4
True

 

// 4. Splitting and Extracting Parts

Let’s take the email example:

email = "  Alice.Johnson@Example.com  "
email = email.strip().lower()

username, domain = email.split("@")

print(f"User: {username}, Domain: {domain}")

 
This prints:

User: alice.johnson, Domain: example.com

 
This kind of extraction is used in user behavior analysis, spam detection, and the like.
 

// 5. Detecting Specific Text Patterns

You don’t need regular expressions for basic pattern checks.

Example: Check if someone mentioned “python” in a free-text response:

comment = "I'm learning Python and SQL for data jobs."

if "python" in comment.lower():
    print("Mentioned Python")

 

//  Exercise: Clean Survey Comments

Create a file called comments.txt with the following lines:

Great course! Loved the pacing.
Not enough Python examples.
Too basic for experienced users.
python is exactly what I needed!
Would like more SQL content.
Excellent – very beginner-friendly.

 

Now write a Python script that:

  1. Cleans each comment (strip, lowercase, remove punctuation)
  2. Prints the total number of comments, how many mention “python”, and the average word count per comment

 

Day 4: Group, Count, & Summarize with Dictionaries

 
You’ve used dict to store labeled records. Today, you’ll go a level deeper: using dictionaries to group, count, and summarize data — just like a pivot table or GROUP BY in SQL.
 

// 1. Grouping by a Field

Let’s say you have this data.

data = [
    {"name": "Alice", "city": "London"},
    {"name": "Bob", "city": "Paris"},
    {"name": "Eve", "city": "London"},
    {"name": "John", "city": "New York"},
    {"name": "Dana", "city": "Paris"},
]

 
Goal: Count how many people are in each city.

city_counts = {}

for person in data:
    city = person["city"]
    if city not in city_counts:
        city_counts[city] = 1
    else:
        city_counts[city] += 1

print(city_counts)

 
Output:

{'London': 2, 'Paris': 2, 'New York': 1}

 

// 2. Summing a Field by Category

Now let’s say we have:

salaries = [
    {"role": "Engineer", "salary": 75000},
    {"role": "Analyst", "salary": 62000},
    {"role": "Engineer", "salary": 80000},
    {"role": "Manager", "salary": 95000},
    {"role": "Analyst", "salary": 64000},
]

 
Goal: Calculate total and average salary per role.

totals = {}
counts = {}

for person in salaries:
    role = person["role"]
    salary = person["salary"]
    
    totals[role] = totals.get(role, 0) + salary
    counts[role] = counts.get(role, 0) + 1

averages = {role: totals[role] / counts[role] for role in totals}

print(averages)

 
Output:

{'Engineer': 77500.0, 'Analyst': 63000.0, 'Manager': 95000.0}

 

// 3. Frequency Table (Mode Detection)

Find the most common age in a dataset:

ages = [29, 34, 29, 41, 34, 29]

freq = {}

for age in ages:
    freq[age] = freq.get(age, 0) + 1

most_common = max(freq.items(), key=lambda x: x[1])

print(f"Most common age: {most_common[0]} (appears {most_common[1]} times)")

 
Output:

Most common age: 29 (appears 3 times)

 

//  Exercise: Analyze Employee Dataset

Create a file employees.txt with the following content:

Alice,London,Engineer,75000
Bob,Paris,Analyst,62000
Eve,London,Engineer,80000
John,New York,Manager,95000
Dana,Paris,Analyst,64000

 

Write a Python script that:

  1. Loads the data into a list of dictionaries
  2. Prints the number of employees per city and the average salary per role

 

Day 5: Writing Functions

 
You’ve written code that loads, cleans, filters, and summarizes data. Now you’ll package that logic into functions, so you can:

  • Reuse your code
  • Build processing pipelines
  • Keep scripts readable and testable

 

// 1. Cleaning Text Inputs

Let’s write a function to perform basic text cleaning:

def clean_text(text):
    return text.strip().lower().replace(",", "").replace("$", "")

 

Now you can apply this to every field you read from a file.
 

// 2. Creating Row Records

Next, here’s a simple function to parse each row in a file and create record:

def parse_row(line):
    parts = line.strip().split(",")
    return {
        "name": parts[0],
        "city": parts[1],
        "role": parts[2],
        "salary": int(parts[3])
    }

 

Now your file loading becomes:

with open("employees.txt") as file:
    rows = [parse_row(line) for line in file]

 

// 3. Aggregation Helpers

So far, you’ve computed averages and count of occurrences. Let’s write some basic helper functions for the same:

def average(values):
    return sum(values) / len(values) if values else 0

def count_by_key(data, key):
    counts = {}
    for item in data:
        k = item[key]
        counts[k] = counts.get(k, 0) + 1
    return counts

 

//  Exercise: Modularize Previous Work

Refactor yesterday’s solution into reusable functions:

  • load_data(filename)
  • average_salary_by_role(data)
  • count_by_city(data)

Then use them in a script that prints the same output as Day 4.

 

Day 6: Reading, Writing, and Basic Error-Handling

 
Data files are often incomplete, corrupted, and misformatted. So how do you deal with them?

Today you’ll learn:

  • How to read and write structured files
  • How to gracefully handle errors
  • How to skip or log bad rows without crashing

 

// 1. Safer File Reading

What happens when you try reading a file that doesn’t exist? Here’s how you “try” opening the file and catch “FileNotFoundError” if the file doesn’t exist.

try:
    with open("employees.txt") as file:
        lines = file.readlines()
except FileNotFoundError:
    print("Error: File not found.")
    lines = []

 

// 2. Handling Bad Rows Gracefully

Now let’s try to skip bad rows and process only the complete rows.

records = []

for line in lines:
    try:
        parts = line.strip().split(",")
        if len(parts) != 4:
            raise ValueError("Incorrect number of fields")
        record = {
            "name": parts[0],
            "city": parts[1],
            "role": parts[2],
            "salary": int(parts[3])
        }
        records.append(record)
    except Exception as e:
        print(f"Skipping bad line: {line.strip()} ({e})")

 

// 3. Writing Cleaned Data to a File

Finally, let’s write the cleaned data to a file.

with open("cleaned_employees.txt", "w") as out:
    for r in records:
        out.write(f"{r['name']},{r['city']},{r['role']},{r['salary']}\n")

 

//  Exercise: Make a Fault-Tolerant Loader

Create a file raw_employees.txt with a couple of incomplete or messy lines like:

Alice,London,Engineer,75000
Bob,Paris,Analyst
Eve,London,Engineer,eighty thousand
John,New York,Manager,95000

 
Write a script that:

  1. Loads only valid records
  2. Prints number of valid rows
  3. Writes them to validated_employees.txt

 

Day 7: Build a Mini Data Profiler (Project Day)

 
Great work on making it so far. Today, you’ll create a standalone Python script that:

  • Loads a CSV file
  • Detects column names and types
  • Computes useful stats
  • Writes a summary report

 

// Step-by-Step Outline

1. Load the file:

def load_csv(filename):
    with open(filename) as f:
        lines = [line.strip() for line in f if line.strip()]
    header = lines[0].split(",")
    rows = [line.split(",") for line in lines[1:]]
    return header, rows

 

2. Detect column types:

def detect_type(value):
    try:
        float(value)
        return "numeric"
    except:
        return "text"

 

3. Profile each column:

def profile_columns(header, rows):
    summary = {}
    for i, col in enumerate(header):
        values = [row[i].strip() for row in rows if len(row) == len(header)]
        col_type = detect_type(values[0])
        unique = set(values)
        summary[col] = {
            "type": col_type,
            "unique_count": len(unique),
            "most_common": max(set(values), key=values.count)
 }
 if col_type == "numeric":
 nums = [float(v) for v in values if v.replace('.', '', 1).isdigit()]
 summary[col]["average"] = sum(nums) / len(nums) if nums else 0
 return summary

 

4. Create a summary:

def write_summary(summary, out_file):
    with open(out_file, "w") as f:
        for col, stats in summary.items():
            f.write(f"Column: {col}\n")
            for k, v in stats.items():
                f.write(f"  {k}: {v}\n")
            f.write("\n")

 

You can use the functions like so:

header, rows = load_csv("employees.csv")
summary = profile_columns(header, rows)
write_summary(summary, "profile_report.txt")

 

//  Final Exercise

Use your own CSV file (or reuse earlier ones). Run the profiler and check the output.

 

Conclusion

 
Congratulations! You’ve completed the Python for Data Science mini-course.

Over this week, you’ve moved from basic Python data structures to writing modular functions and scripts that handle real data problems. These are the basics, and by that I mean, really basic stuff. I suggest you use this as a starting point and learn more about Python’s standard library (by doing of course).

Thank you for learning with me. Happy coding and data crunching ahead!
 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.



Source link

RELATED ARTICLES

Most Popular

Recent Comments