String formatting is how you build messages, logs, and output from variables. String methods help you clean, transform, and validate text data. In Data Engineering, you’ll constantly parse CSVs, validate input, construct SQL queries, and log pipeline events—all requiring solid string skills.
Modern String Formatting: F-Strings
F-strings (formatted string literals, Python 3.6+) are the modern, preferred way to format strings. They’re faster, more readable, and less error-prone than older methods.
Basic F-String Syntax
name = "Alice"
age = 30
# Simple variable insertion
message = f"Hello, {name}!" # "Hello, Alice!"
# Expressions inside braces
result = f"In 5 years, {name} will be {age + 5}" # "In 5 years, Alice will be 35"
# Multiple variables
data = f"Name: {name}, Age: {age}" # "Name: Alice, Age: 30"Formatting Numbers in F-Strings
# Decimal places for floats
price = 19.99
print(f"Price: ${price:.2f}") # "Price: $19.99"
# Padding integers with zeros
user_id = 42
print(f"ID: {user_id:05d}") # "ID: 00042"
# Thousands separator
population = 1000000
print(f"Population: {population:,}") # "Population: 1,000,000"
# Percentage
success_rate = 0.9543
print(f"Success: {success_rate:.1%}") # "Success: 95.4%"Formatting with Expressions
# Method calls inside f-strings
name = "alice"
print(f"Welcome, {name.upper()}!") # "Welcome, ALICE!"
# List comprehensions
numbers = [1, 2, 3, 4, 5]
print(f"Squares: {[x**2 for x in numbers]}") # "Squares: [1, 4, 9, 16, 25]"
# Conditional expressions
age = 20
status = f"Adult" if age >= 18 else "Minor"
print(f"Status: {status}") # "Status: Adult"Multi-line F-Strings
user = {"name": "Alice", "age": 30, "city": "Paris"}
message = f"""
User Profile:
Name: {user['name']}
Age: {user['age']}
City: {user['city']}
"""
print(message)Older String Formatting (For Legacy Code)
You may encounter older code using these methods. Understand them but prefer f-strings in new code.
.format() Method
name = "Bob"
age = 25
# Basic usage
print("{} is {} years old".format(name, age)) # "Bob is 25 years old"
# Named parameters
print("{name} is {age} years old".format(name="Bob", age=25))
# Formatting numbers
print("Price: ${:.2f}".format(19.99)) # "Price: $19.99"Modulo Operator (%)
name = "Charlie"
age = 35
# %s for strings, %d for integers, %f for floats
print("%s is %d years old" % (name, age)) # "Charlie is 35 years old"
# Formatting floats
print("Price: $%.2f" % 19.99) # "Price: $19.99"Essential String Methods
String methods transform, validate, and analyze text data.
Cleaning and Whitespace
# Remove leading/trailing whitespace
text = " hello world "
print(text.strip()) # "hello world"
print(text.lstrip()) # "hello world " (left only)
print(text.rstrip()) # " hello world" (right only)
# Remove leading/trailing specific characters
path = "/data/file.csv/"
print(path.strip("/")) # "data/file.csv"Case Conversion
text = "Hello World"
print(text.upper()) # "HELLO WORLD"
print(text.lower()) # "hello world"
print(text.capitalize()) # "Hello world" (first char only)
print(text.title()) # "Hello World" (each word)
print(text.swapcase()) # "hELLO wORLD"Searching and Checking
text = "data science"
# Check if substring exists
"science" in text # True
text.startswith("data") # True
text.endswith("science") # True
text.find("science") # 5 (index of first occurrence)
text.count("a") # 2 (occurrences of "a")
# Check character types
"123".isdigit() # True
"hello".isalpha() # True
" ".isspace() # True
"Hello123".isalnum() # TrueSplitting and Joining
# Split string into list
csv_line = "Alice,30,Engineer"
parts = csv_line.split(",") # ["Alice", "30", "Engineer"]
# Split with limit
text = "a:b:c:d"
limited = text.split(":", 2) # ["a", "b", "c:d"] (max 2 splits)
# Join list into string
names = ["Alice", "Bob", "Charlie"]
print(", ".join(names)) # "Alice, Bob, Charlie"
# Split lines
csv_data = "name,age\nAlice,30\nBob,25"
lines = csv_data.split("\n") # ["name,age", "Alice,30", "Bob,25"]Finding and Replacing
text = "The quick brown fox jumps over the lazy dog"
# Find substring (returns index, or -1 if not found)
print(text.find("brown")) # 10
print(text.find("missing")) # -1
# Replace occurrences
print(text.replace("fox", "cat")) # "The quick brown cat..."
print(text.replace("the", "a", 1)) # Limit replacements: "a quick brown fox..."
# Replace with method chaining (common in ETL)
dirty = " john doe "
clean = dirty.strip().lower().replace(" ", "_") # "john_doe"Data Engineering Patterns
Parsing CSV Data
def parse_csv_line(line: str) -> dict[str, str]:
"""Parse CSV line into dictionary."""
# Remove whitespace, split on comma
parts = [p.strip() for p in line.split(",")]
keys = ["id", "name", "age", "city"]
return dict(zip(keys, parts))
# Usage
line = " 1 , alice , 30 , Paris"
record = parse_csv_line(line)
# {"id": "1", "name": "alice", "age": "30", "city": "Paris"}Validating Input Strings
def validate_email(email: str) -> bool:
"""Basic email validation."""
# Remove whitespace
email = email.strip().lower()
# Check format
if "@" not in email or "." not in email.split("@")[1]:
return False
return True
def validate_phone(phone: str) -> bool:
"""Validate phone number (digits and hyphens only)."""
phone = phone.strip()
# Remove common separators
cleaned = phone.replace("-", "").replace(" ", "")
# Check if all digits
return cleaned.isdigit() and 10 <= len(cleaned) <= 15Normalizing Data
def normalize_record(record: dict[str, str]) -> dict[str, str]:
"""Normalize string fields in a record."""
normalized = {}
for key, value in record.items():
# Strip whitespace, lowercase
normalized[key] = value.strip().lower()
# Remove special characters from specific fields
if key == "phone":
normalized[key] = value.replace("-", "").replace(" ", "")
return normalized
# Usage
raw = {"name": " ALICE ", "email": " [email protected] ", "phone": "555-123-4567"}
clean = normalize_record(raw)
# {"name": "alice", "email": "[email protected]", "phone": "5551234567"}Building Dynamic Queries
def build_sql_filter(filters: dict[str, str]) -> str:
"""Build WHERE clause from filter dict."""
conditions = []
for key, value in filters.items():
# Safely handle string values
value_safe = value.strip().replace("'", "''") # Escape quotes
conditions.append(f"{key} = '{value_safe}'")
return " AND ".join(conditions)
# Usage
filters = {"status": "active", "country": "France"}
where = build_sql_filter(filters)
# "status = 'active' AND country = 'France'"Logging Pipeline Events
import logging
from datetime import datetime
logger = logging.getLogger(__name__)
def log_pipeline_event(
stage: str,
status: str,
count: int,
duration: float,
error: str | None = None
) -> None:
"""Log ETL pipeline event."""
timestamp = datetime.now().isoformat()
message = (
f"[{timestamp}] {stage:15} | "
f"Status: {status:10} | "
f"Records: {count:8,} | "
f"Duration: {duration:6.2f}s"
)
if error:
message += f" | Error: {error}"
logger.error(message)
else:
logger.info(message)
# Usage
log_pipeline_event("Extract", "Success", 50000, 12.5)
# [2026-01-22T14:30:15.123456] Extract | Status: Success | Records: 50,000 | Duration: 12.50sComparison: F-String vs Other Methods
| Method | Syntax | Speed | Readability | Use Case |
|---|---|---|---|---|
| F-string | f"{var}" | ⚡ Fastest | ⭐⭐⭐ Excellent | Modern code (Python 3.6+) |
| .format() | "{0}".format(var) | ⭐ Medium | ⭐⭐ Good | Legacy code, older Python |
| % operator | "%s" % var | ⭐ Slower | ⭐ Poor | Old code only |
Tips & Gotchas
- F-strings are expressions, not strings. They’re evaluated at runtime.
# ✅ Works
price = 19.99
message = f"Price: ${price:.2f}"
# ❌ Doesn't work (can't use f-string in raw string)
raw_sql = fr"SELECT * FROM users WHERE id = {user_id}" # rf"..." is ok- Escape braces with double braces in f-strings.
# ❌ Wrong
print(f"JSON: {{'name': 'Alice'}}") # SyntaxError
# ✅ Correct
print(f"JSON: {{'name': 'Alice'}}") # "JSON: {'name': 'Alice'}".strip()removes whitespace, not specific characters by default.
# ❌ Wrong expectation
text = "data.csv"
print(text.strip(".csv")) # "data" (removes 'd', 'a' from ends too!)
# ✅ For removing file extensions, use different approach
print(text.rsplit(".", 1)[0]) # "data"- String methods don’t modify in-place—they return new strings.
text = " hello "
text.strip() # ❌ Doesn't modify text
text = text.strip() # ✅ Reassign to update- Use
.lower()for case-insensitive comparison, not.upper().
# Compare emails (case-insensitive)
email1 = "[email protected]"
email2 = "[email protected]"
# ✅ Correct
print(email1.lower() == email2.lower()) # True- Split can produce empty strings—filter them if needed.
text = "a,,b,c"
parts = text.split(",") # ["a", "", "b", "c"]
parts = [p for p in parts if p] # ["a", "b", "c"] (filter empty)Related
- Python-Loops — Loop through strings character by character
- Python-List-Comprehensions — Transform lists of strings
- Python-Control-Flow — Validate strings with conditions
- Python-Modules-Functions-Lists — String methods in functions
- Official String Methods Documentation
- Real Python: F-Strings Guide
Key Takeaway:
Use f-strings for modern code—they’re fast, readable, and expressive. Master string methods like .strip(), .split(), .join(), and .replace() for data cleaning. Test your formatting and validation logic thoroughly; data quality depends on it.