Data & Analyticsadvanced

ETL Script

Name: ETL Script
Author: Claude Skills Hub

Create ETL (Extract, Transform, Load) scripts

You are a data engineer specializing in ETL pipeline design. The user wants to create a production-ready ETL script that extracts data from a source, transforms it according to business rules, and loads it into a target system.

What to check first

Verify source system credentials and connectivity: curl -X GET https://api.source.com/health or test database connection with psql -h localhost -U user -d database -c "SELECT 1"
Confirm target database exists and user has INSERT/UPDATE permissions: SHOW GRANTS FOR 'etl_user'@'localhost'; (MySQL) or \dp (PostgreSQL)
Check available disk space for staging area: df -h /staging — ETL jobs can be I/O intensive
Validate required Python packages: pip list | grep -E "pandas|sqlalchemy|requests"

Steps

Define source connector using appropriate library (requests for APIs, sqlalchemy for databases, boto3 for S3) with retry logic and pagination support
Implement extraction function that batches records to avoid memory overflow — use chunk_size=10000 for large datasets
Build transformation pipeline using pandas DataFrame operations, apply business logic functions, and validate data quality rules
Create data validation layer to check for nulls, duplicates, type mismatches using assertions or pandera schema validation
Implement error handling with detailed logging at extraction, transform, and load stages — use Python's logging module with file handlers
Build load function with upsert capability (INSERT ... ON DUPLICATE KEY UPDATE or MERGE) to handle incremental loads safely
Add transaction rollback mechanism — wrap load operations in try/except with explicit rollback on constraint violations
Schedule execution using cron jobs or Airflow DAGs with failure notifications and idempotency checks

Code

import pandas as pd
import logging
from sqlalchemy import create_engine, text
from sqlalchemy.exc import IntegrityError
import requests
from datetime import datetime
from typing import List, Dict, Any

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/var/log/etl_pipeline.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

class ETLPipeline:
    def __init__(self, source_url: str, db_connection: str, batch_size: int = 10000):
        self.source_url = source_url
        self.engine = create_engine(db_connection)
        self.batch_size = batch_size
        self.records_processed = 0
        self.records_failed = 0

    def extract(self) -> List[Dict[str, Any]]:
        """Extract data from source API with pagination and retry logic."""
        try:
            logger.info(f"Starting extraction from {self.source_url

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

Treating this skill as a one-shot solution — most workflows need iteration and verification
Skipping the verification steps — you don't know it worked until you measure
Applying this skill without understanding the underlying problem — read the related docs first

When NOT to Use This Skill

When a simpler manual approach would take less than 10 minutes
On critical production systems without testing in staging first
When you don't have permission or authorization to make these changes

How to Verify It Worked

Run the verification steps documented above
Compare the output against your expected baseline
Check logs for any warnings or errors — silent failures are the worst kind

Production Considerations

Test in staging before deploying to production
Have a rollback plan — every change should be reversible
Monitor the affected systems for at least 24 hours after the change

Quick Info

CategoryData & Analytics

Difficultyadvanced

Version1.0.0

AuthorClaude Skills Hub

dataetlprocessing

Install command:

curl -o ~/.claude/skills/etl-script.md https://claude-skills-hub.vercel.app/skills/data/etl-script.md

Related Data & Analytics Skills

Other Claude Code skills in the same category — free to download.

Browse all

Data & Analyticsbeginner

CSV Parser

Parse and process CSV files

Data & Analyticsintermediate

Data Transformer

Transform data between formats (JSON, XML, CSV)

Data & Analyticsintermediate

Analytics Setup

Set up analytics tracking (GA4, Mixpanel, PostHog)

Data & Analyticsadvanced

Data Pipeline

Create data processing pipeline

Data & Analyticsintermediate

Report Generator

Generate reports from data

Data & Analyticsintermediate

Chart Creator

Create charts and visualizations (Chart.js, D3)

Data & Analyticsbeginner

Data Exporter

Export data in multiple formats

Data & Analyticsintermediate

Data Validator

Validate data integrity and format

Want a Data & Analytics skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.

Custom Agent — $5 →|Analyze My Stack — $3 →

ETL Script

What to check first

Steps

Code

Common Pitfalls

When NOT to Use This Skill

How to Verify It Worked

Production Considerations

Quick Info

Related Skills

Related Data & Analytics Skills

CSV Parser

Data Transformer

Analytics Setup

Data Pipeline

Report Generator

Chart Creator

Data Exporter

Data Validator