sqlx-record/.claude/skills/sqlx-batch-ops.md

4.9 KiB

sqlx-record Batch Operations Skill

Guide to insert_many() and upsert() for efficient bulk operations.

Triggers

  • "batch insert", "bulk insert"
  • "insert many", "insert_many"
  • "upsert", "insert or update"
  • "on conflict", "on duplicate key"

Overview

sqlx-record provides efficient batch operations:

  • insert_many() - Insert multiple records in a single query
  • upsert() - Insert or update on primary key conflict

insert_many()

Insert multiple entities in a single SQL statement:

pub async fn insert_many(executor, entities: &[Self]) -> Result<Vec<PkType>, Error>

Usage

use sqlx_record::prelude::*;

let users = vec![
    User { id: new_uuid(), name: "Alice".into(), email: "alice@example.com".into() },
    User { id: new_uuid(), name: "Bob".into(), email: "bob@example.com".into() },
    User { id: new_uuid(), name: "Carol".into(), email: "carol@example.com".into() },
];

// Insert all in single query
let ids = User::insert_many(&pool, &users).await?;

println!("Inserted {} users", ids.len());

SQL Generated

-- MySQL
INSERT INTO users (id, name, email) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?)

-- PostgreSQL
INSERT INTO users (id, name, email) VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9)

-- SQLite
INSERT INTO users (id, name, email) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?)

Benefits

  • Single round-trip to database
  • Much faster than N individual inserts
  • Atomic - all succeed or all fail

Limitations

  • Entity must implement Clone (for collecting PKs)
  • Empty slice returns empty vec without database call
  • Very large batches may hit database limits (split into chunks if needed)

Chunked Insert

For very large datasets:

const BATCH_SIZE: usize = 1000;

async fn insert_large_dataset(pool: &Pool, users: Vec<User>) -> Result<Vec<Uuid>, sqlx::Error> {
    let mut all_ids = Vec::with_capacity(users.len());

    for chunk in users.chunks(BATCH_SIZE) {
        let ids = User::insert_many(pool, chunk).await?;
        all_ids.extend(ids);
    }

    Ok(all_ids)
}

upsert() / insert_or_update()

Insert a new record, or update if primary key already exists:

pub async fn upsert(&self, executor) -> Result<PkType, Error>
pub async fn insert_or_update(&self, executor) -> Result<PkType, Error>  // alias

Usage

let user = User {
    id: existing_or_new_id,
    name: "Alice".into(),
    email: "alice@example.com".into(),
};

// Insert if new, update if exists
user.upsert(&pool).await?;

// Or using alias
user.insert_or_update(&pool).await?;

SQL Generated

-- MySQL
INSERT INTO users (id, name, email) VALUES (?, ?, ?)
ON DUPLICATE KEY UPDATE name = VALUES(name), email = VALUES(email)

-- PostgreSQL
INSERT INTO users (id, name, email) VALUES ($1, $2, $3)
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email

-- SQLite
INSERT INTO users (id, name, email) VALUES (?, ?, ?)
ON CONFLICT(id) DO UPDATE SET name = excluded.name, email = excluded.email

Use Cases

  1. Sync external data: Import data that may already exist
  2. Idempotent operations: Safe to retry without duplicates
  3. Cache refresh: Update cached records atomically

Examples

Sync Products

async fn sync_products(pool: &Pool, external_products: Vec<ExternalProduct>) -> Result<(), sqlx::Error> {
    for ext in external_products {
        let product = Product {
            id: ext.id,  // Use external ID as PK
            name: ext.name,
            price: ext.price,
            updated_at: chrono::Utc::now().timestamp_millis(),
        };
        product.upsert(pool).await?;
    }
    Ok(())
}

Idempotent Event Processing

async fn process_event(pool: &Pool, event: Event) -> Result<(), sqlx::Error> {
    let record = ProcessedEvent {
        id: event.id,  // Event ID as PK - prevents duplicates
        event_type: event.event_type,
        payload: event.payload,
        processed_at: chrono::Utc::now().timestamp_millis(),
    };

    // Safe to call multiple times - won't create duplicates
    record.upsert(pool).await?;
    Ok(())
}

With Transaction

use sqlx_record::transaction;

transaction!(&pool, |tx| {
    // Upsert multiple records atomically
    for item in items {
        item.upsert(&mut *tx).await?;
    }
    Ok::<_, sqlx::Error>(())
}).await?;

Comparison

Operation Behavior on Existing PK SQL Efficiency
insert() Error (duplicate key) Single row
insert_many() Error (duplicate key) Multiple rows, single query
upsert() Updates all non-PK fields Single row

Notes

  • upsert() updates ALL non-PK fields, not just changed ones
  • Primary key must be properly indexed (usually automatic)
  • For partial updates, use insert() + update_by_id() with conflict check
  • insert_many() requires all entities have unique PKs among themselves