Python String Formatting & Methods

Strings are everywhere in data engineering — log messages, SQL queries, CSV parsing, email validation, config values. Getting comfortable with f-strings and string methods is one of those things that quietly saves you hours. It’s not glamorous, but messy string handling is the source of more silent bugs than most people admit.

Modern String Formatting: F-Strings

F-strings (formatted string literals, Python 3.6+) are the modern, preferred way to format strings. They’re faster, more readable, and less error-prone than older methods.

Basic F-String Syntax

name = "Alice"
age = 30
 
# Simple variable insertion
message = f"Hello, {name}!"  # "Hello, Alice!"
 
# Expressions inside braces
result = f"In 5 years, {name} will be {age + 5}"  # "In 5 years, Alice will be 35"
 
# Multiple variables
data = f"Name: {name}, Age: {age}"  # "Name: Alice, Age: 30"

Formatting Numbers in F-Strings

# Decimal places for floats
price = 19.99
print(f"Price: ${price:.2f}")  # "Price: $19.99"
 
# Padding integers with zeros
user_id = 42
print(f"ID: {user_id:05d}")  # "ID: 00042"
 
# Thousands separator
population = 1000000
print(f"Population: {population:,}")  # "Population: 1,000,000"
 
# Percentage
success_rate = 0.9543
print(f"Success: {success_rate:.1%}")  # "Success: 95.4%"

Formatting with Expressions

# Method calls inside f-strings
name = "alice"
print(f"Welcome, {name.upper()}!")  # "Welcome, ALICE!"
 
# List comprehensions
numbers = [1, 2, 3, 4, 5]
print(f"Squares: {[x**2 for x in numbers]}")  # "Squares: [1, 4, 9, 16, 25]"
 
# Conditional expressions
age = 20
status = f"Adult" if age >= 18 else "Minor"
print(f"Status: {status}")  # "Status: Adult"

Multi-line F-Strings

user = {"name": "Alice", "age": 30, "city": "Paris"}
message = f"""
User Profile:
  Name: {user['name']}
  Age: {user['age']}
  City: {user['city']}
"""
print(message)

Older String Formatting (For Legacy Code)

You may encounter older code using these methods. Understand them but prefer f-strings in new code.

`.format()` Method

name = "Bob"
age = 25
 
# Basic usage
print("{} is {} years old".format(name, age))  # "Bob is 25 years old"
 
# Named parameters
print("{name} is {age} years old".format(name="Bob", age=25))
 
# Formatting numbers
print("Price: ${:.2f}".format(19.99))  # "Price: $19.99"

Modulo Operator (%)

name = "Charlie"
age = 35
 
# %s for strings, %d for integers, %f for floats
print("%s is %d years old" % (name, age))  # "Charlie is 35 years old"
 
# Formatting floats
print("Price: $%.2f" % 19.99)  # "Price: $19.99"

Essential String Methods

String methods transform, validate, and analyze text data.

Cleaning and Whitespace

# Remove leading/trailing whitespace
text = "  hello world  "
print(text.strip())      # "hello world"
print(text.lstrip())     # "hello world  " (left only)
print(text.rstrip())     # "  hello world" (right only)
 
# Remove leading/trailing specific characters
path = "/data/file.csv/"
print(path.strip("/"))   # "data/file.csv"

Case Conversion

text = "Hello World"
 
print(text.upper())      # "HELLO WORLD"
print(text.lower())      # "hello world"
print(text.capitalize()) # "Hello world" (first char only)
print(text.title())      # "Hello World" (each word)
print(text.swapcase())   # "hELLO wORLD"

Searching and Checking

text = "data science"
 
# Check if substring exists
"science" in text           # True
text.startswith("data")     # True
text.endswith("science")    # True
text.find("science")        # 5 (index of first occurrence)
text.count("a")             # 2 (occurrences of "a")
 
# Check character types
"123".isdigit()             # True
"hello".isalpha()           # True
"  ".isspace()              # True
"Hello123".isalnum()        # True

Splitting and Joining

# Split string into list
csv_line = "Alice,30,Engineer"
parts = csv_line.split(",")  # ["Alice", "30", "Engineer"]
 
# Split with limit
text = "a:b:c:d"
limited = text.split(":", 2)  # ["a", "b", "c:d"] (max 2 splits)
 
# Join list into string
names = ["Alice", "Bob", "Charlie"]
print(", ".join(names))  # "Alice, Bob, Charlie"
 
# Split lines
csv_data = "name,age\nAlice,30\nBob,25"
lines = csv_data.split("\n")  # ["name,age", "Alice,30", "Bob,25"]

Finding and Replacing

text = "The quick brown fox jumps over the lazy dog"
 
# Find substring (returns index, or -1 if not found)
print(text.find("brown"))      # 10
print(text.find("missing"))    # -1
 
# Replace occurrences
print(text.replace("fox", "cat"))      # "The quick brown cat..."
print(text.replace("the", "a", 1))     # Limit replacements: "a quick brown fox..."
 
# Replace with method chaining (common in ETL)
dirty = "  john doe  "
clean = dirty.strip().lower().replace(" ", "_")  # "john_doe"

Data Engineering Patterns

Parsing CSV Data

def parse_csv_line(line: str) -> dict[str, str]:
    """Parse CSV line into dictionary."""
    # Remove whitespace, split on comma
    parts = [p.strip() for p in line.split(",")]
    keys = ["id", "name", "age", "city"]
    return dict(zip(keys, parts))
 
# Usage
line = "  1  , alice  , 30 , Paris"
record = parse_csv_line(line)
# {"id": "1", "name": "alice", "age": "30", "city": "Paris"}

Validating Input Strings

def validate_email(email: str) -> bool:
    """Basic email validation."""
    # Remove whitespace
    email = email.strip().lower()
    
    # Check format
    if "@" not in email or "." not in email.split("@")[1]:
        return False
    
    return True
 
def validate_phone(phone: str) -> bool:
    """Validate phone number (digits and hyphens only)."""
    phone = phone.strip()
    
    # Remove common separators
    cleaned = phone.replace("-", "").replace(" ", "")
    
    # Check if all digits
    return cleaned.isdigit() and 10 <= len(cleaned) <= 15

Normalizing Data

def normalize_record(record: dict[str, str]) -> dict[str, str]:
    """Normalize string fields in a record."""
    normalized = {}
    
    for key, value in record.items():
        # Strip whitespace, lowercase
        normalized[key] = value.strip().lower()
        
        # Remove special characters from specific fields
        if key == "phone":
            normalized[key] = value.replace("-", "").replace(" ", "")
    
    return normalized
 
# Usage
raw = {"name": "  ALICE  ", "email": "  [email protected]  ", "phone": "555-123-4567"}
clean = normalize_record(raw)
# {"name": "alice", "email": "[email protected]", "phone": "5551234567"}

Building Dynamic Queries

def build_sql_filter(filters: dict[str, str]) -> str:
    """Build WHERE clause from filter dict."""
    conditions = []
    
    for key, value in filters.items():
        # Safely handle string values
        value_safe = value.strip().replace("'", "''")  # Escape quotes
        conditions.append(f"{key} = '{value_safe}'")
    
    return " AND ".join(conditions)
 
# Usage
filters = {"status": "active", "country": "France"}
where = build_sql_filter(filters)
# "status = 'active' AND country = 'France'"

Logging Pipeline Events

import logging
from datetime import datetime
 
logger = logging.getLogger(__name__)
 
def log_pipeline_event(
    stage: str,
    status: str,
    count: int,
    duration: float,
    error: str | None = None
) -> None:
    """Log ETL pipeline event."""
    timestamp = datetime.now().isoformat()
    
    message = (
        f"[{timestamp}] {stage:15} | "
        f"Status: {status:10} | "
        f"Records: {count:8,} | "
        f"Duration: {duration:6.2f}s"
    )
    
    if error:
        message += f" | Error: {error}"
        logger.error(message)
    else:
        logger.info(message)
 
# Usage
log_pipeline_event("Extract", "Success", 50000, 12.5)
# [2026-01-22T14:30:15.123456] Extract      | Status: Success    | Records:    50,000 | Duration:  12.50s

Comparison: F-String vs Other Methods

Method	Syntax	Speed	Readability	Use Case
F-string	`f"{var}"`	⚡ Fastest	⭐⭐⭐ Excellent	Modern code (Python 3.6+)
.format()	`"{0}".format(var)`	⭐ Medium	⭐⭐ Good	Legacy code, older Python
% operator	`"%s" % var`	⭐ Slower	⭐ Poor	Old code only

Tips & Gotchas

F-strings are expressions, not strings. They’re evaluated at runtime.

# ✅ Works
price = 19.99
message = f"Price: ${price:.2f}"
 
# ❌ Doesn't work (can't use f-string in raw string)
raw_sql = fr"SELECT * FROM users WHERE id = {user_id}"  # rf"..." is ok

Escape braces with double braces in f-strings.

# ❌ Wrong
print(f"JSON: {{'name': 'Alice'}}")  # SyntaxError
 
# ✅ Correct
print(f"JSON: {{'name': 'Alice'}}")  # "JSON: {'name': 'Alice'}"

.strip() removes whitespace, not specific characters by default.

# ❌ Wrong expectation
text = "data.csv"
print(text.strip(".csv"))  # "data" (removes 'd', 'a' from ends too!)
 
# ✅ For removing file extensions, use different approach
print(text.rsplit(".", 1)[0])  # "data"

String methods don’t modify in-place—they return new strings.

text = "  hello  "
text.strip()      # ❌ Doesn't modify text
text = text.strip()  # ✅ Reassign to update

Use .lower() for case-insensitive comparison, not .upper().

# Compare emails (case-insensitive)
email1 = "[email protected]"
email2 = "[email protected]"
 
# ✅ Correct
print(email1.lower() == email2.lower())  # True

Split can produce empty strings—filter them if needed.

text = "a,,b,c"
parts = text.split(",")           # ["a", "", "b", "c"]
parts = [p for p in parts if p]   # ["a", "b", "c"] (filter empty)

Python-Loops — Loop through strings character by character
Python-List-Comprehensions — Transform lists of strings
Python-Control-Flow — Validate strings with conditions
Python-Modules-Functions-Lists — String methods in functions
Official String Methods Documentation
Real Python: F-Strings Guide

Key Takeaway:
Use f-strings for modern code—they’re fast, readable, and expressive. Master string methods like .strip(), .split(), .join(), and .replace() for data cleaning. Test your formatting and validation logic thoroughly; data quality depends on it.

Explorer

Ben Warai Otoko

Python String Formatting & Methods

Table of Contents

Python String Formatting & Methods

Modern String Formatting: F-Strings

Basic F-String Syntax

Formatting Numbers in F-Strings

Formatting with Expressions

Multi-line F-Strings

Older String Formatting (For Legacy Code)

`.format()` Method

Modulo Operator (%)

Essential String Methods

Cleaning and Whitespace

Case Conversion

Searching and Checking

Splitting and Joining

Finding and Replacing

Data Engineering Patterns

Parsing CSV Data

Validating Input Strings

Normalizing Data

Building Dynamic Queries

Logging Pipeline Events

Comparison: F-String vs Other Methods

Tips & Gotchas

Graph View

Backlinks

Recent Notes

Second Brain

Projects

Blog Posts

Learning Logs

Explorer

Python String Formatting & Methods

Table of Contents

Python String Formatting & Methods

Modern String Formatting: F-Strings

Basic F-String Syntax

Formatting Numbers in F-Strings

Formatting with Expressions

Multi-line F-Strings

Older String Formatting (For Legacy Code)

.format() Method

Modulo Operator (%)

Essential String Methods

Cleaning and Whitespace

Case Conversion

Searching and Checking

Splitting and Joining

Finding and Replacing

Data Engineering Patterns

Parsing CSV Data

Validating Input Strings

Normalizing Data

Building Dynamic Queries

Logging Pipeline Events

Comparison: F-String vs Other Methods

Tips & Gotchas

Related

Graph View

Backlinks

Recent Notes

Second Brain

Projects

Blog Posts

Learning Logs

`.format()` Method