Use Snowpark for Python-based data engineering and ML in Snowflake
✓Works with OpenClaudeYou are a Snowflake data engineer using Snowpark for Python. The user wants to build and execute Python-based data transformations and ML pipelines directly within Snowflake using Snowpark.
What to check first
- Run
pip list | grep snowparkto verify Snowpark for Python is installed (version 1.0+) - Confirm Snowflake account credentials (account identifier, user, password, warehouse, database, schema)
- Check that your Snowflake account has compute resources (warehouse) active and accessible
Steps
- Install Snowpark:
pip install snowpark(includes pandas, numpy dependencies) - Create a Snowflake Session object using
Session.builder()with account, user, password, warehouse, database, and schema parameters - Load data into a DataFrame using
session.table()for existing tables orsession.create_dataframe()for Python objects - Apply transformations using Snowpark DataFrame API methods like
.select(),.filter(),.with_column(), and.group_by() - Use
snowpark.functionsfor SQL-equivalent operations (e.g.,col(),when(),lag(),dense_rank()) - For ML: use
snowpark.ml.preprocessingandsnowpark.ml.modelingto apply sklearn-compatible models at scale - Execute
.collect()to materialize results locally or.write.mode().save_as_table()to persist in Snowflake - Close the session with
session.close()when done
Code
from snowpark.session import Session
from snowpark.functions import col, when, avg, count
from snowpark.types import StructType, StructField, StringType, IntegerType, DoubleType
import pandas as pd
# 1. Initialize Snowflake Session
connection_params = {
"account": "xy12345.us-east-1",
"user": "your_username",
"password": "your_password",
"warehouse": "compute_wh",
"database": "analytics_db",
"schema": "public"
}
session = Session.builder.configs(connection_params).create()
# 2. Load data from Snowflake table
df = session.table("sales_data")
# 3. Transform using Snowpark DataFrame API
df_transformed = (
df
.filter(col("amount") > 100)
.with_column("revenue_category",
when(col("amount") > 1000, "high")
.when(col("amount") > 500, "medium")
.otherwise("low"))
.group_by("product_id", "revenue_category")
.agg(
count("*").alias("transaction_count"),
avg("amount").alias("avg_amount")
)
)
# 4. Create a Snowpark DataFrame from Python data
local_data
Note: this example was truncated in the source. See the GitHub repo for the latest full version.
Common Pitfalls
- Treating this skill as a one-shot solution — most workflows need iteration and verification
- Skipping the verification steps — you don't know it worked until you measure
- Applying this skill without understanding the underlying problem — read the related docs first
When NOT to Use This Skill
- When a simpler manual approach would take less than 10 minutes
- On critical production systems without testing in staging first
- When you don't have permission or authorization to make these changes
How to Verify It Worked
- Run the verification steps documented above
- Compare the output against your expected baseline
- Check logs for any warnings or errors — silent failures are the worst kind
Production Considerations
- Test in staging before deploying to production
- Have a rollback plan — every change should be reversible
- Monitor the affected systems for at least 24 hours after the change
Related Snowflake Skills
Other Claude Code skills in the same category — free to download.
Snowflake SQL
Write optimized Snowflake SQL with CTEs, window functions, and semi-structured data
Snowflake dbt Models
Build dbt models, tests, and macros for Snowflake transformations
Snowflake Streams & Tasks
Set up change data capture with streams and scheduled tasks
Snowflake Snowpipe
Configure continuous data ingestion with Snowpipe and external stages
Snowflake RBAC
Configure role-based access control with roles, privileges, and masking
Snowflake Stored Procedures
Write JavaScript and SQL stored procedures in Snowflake
Snowflake Data Sharing
Set up secure data sharing and data marketplace listings
Want a Snowflake skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.