Upload Cache Guide

ostruct’s upload cache eliminates duplicate file uploads across runs, providing significant performance improvements and cost savings.

How It Works

  1. When you attach a file, ostruct computes its SHA-256 hash

  2. The cache is checked for this hash

  3. If found, the existing OpenAI file ID is reused

  4. If not found, the file is uploaded and cached

This means identical files are uploaded exactly once, regardless of:

  • How many times you run ostruct

  • What names you use for the files

  • Which tools (CI, FS) you attach them to

Benefits

  • Performance: Instant reuse of previously uploaded files

  • Cost Savings: Reduced API usage and bandwidth

  • Reliability: Fewer uploads mean fewer potential failures

Configuration

The upload cache is enabled by default. Configure it in ostruct.yaml:

uploads:
  persistent_cache: true      # Enable/disable cache
  preserve_cached_files: true # Enable TTL-based cache cleanup
  cache_max_age_days: 14      # Cache retention period (days)
  cache_path: null            # Use default path
  hash_algorithm: sha256      # Hash algorithm

Cache Cleanup & TTL Management

TTL Rationale: Two-week window covers typical sprint cycles while keeping embedded storage fees <$0.02 per 100MB.

ostruct automatically manages cached files using a Time-To-Live (TTL) system:

  • Default TTL: 14 days (configurable)

  • Smart Cleanup: Preserves cached files within TTL, deletes expired ones

  • Cost Control: Prevents unlimited accumulation of storage costs

  • Privacy Compliance: Supports immediate deletion for sensitive data

Configuration Options:

uploads:
  preserve_cached_files: true  # Enable TTL-based cleanup
  cache_max_age_days: 14      # Files older than this are deleted

Common TTL Strategies:

  • Development (3-7 days): Frequent iterations, cost-conscious

  • Production (14-30 days): Stability and performance focused

  • Compliance (0 days): Immediate deletion for sensitive data

Compliance Mode (immediate deletion):

uploads:
  preserve_cached_files: false  # Disable preservation
  cache_max_age_days: 0        # Force immediate cleanup

Cache Location

By default, the cache is stored in platform-specific locations:

  • macOS: ~/Library/Caches/ostruct/upload_cache.sqlite

  • Linux: ~/.cache/ostruct/upload_cache.sqlite

  • Windows: %LOCALAPPDATA%\ostruct\upload_cache.sqlite

Command Line Options

# Disable cache for this run
ostruct run template.j2 schema.json --no-cache-uploads

# Disable cache preservation (force cleanup)
ostruct run template.j2 schema.json --no-cache-preserve

# Use custom cache location
ostruct run template.j2 schema.json --cache-path ~/.my-cache/uploads.db

Environment Variables

# Disable cache globally
export OSTRUCT_CACHE_UPLOADS=false

# Use custom cache path
export OSTRUCT_CACHE_PATH=/custom/path/cache.db

# Use different hash algorithm
export OSTRUCT_CACHE_ALGO=sha1

# Configure cache cleanup
export OSTRUCT_PRESERVE_CACHED_FILES=true
export OSTRUCT_CACHE_MAX_AGE_DAYS=14

Performance Examples

First run - uploads files:

$ ostruct run analysis.j2 schema.json --file ci:data large_dataset.csv
# Uploads large_dataset.csv (takes time based on file size)

Subsequent runs - reuses cached uploads (instant!):

$ ostruct run analysis.j2 schema.json --file ci:data large_dataset.csv
# Reuses cached upload instantly

The cache works across all file attachments:

  • Code Interpreter files (--file ci:)

  • File Search documents (--file fs:)

  • Multi-tool attachments (--file ci,fs:)

Troubleshooting

Cache not working?

  1. Check if cache is enabled: ostruct run --help | grep cache

  2. Verify cache location has write permissions

  3. Use --verbose to see cache operations

Need to clear the cache?

The cache automatically cleans up expired files based on TTL settings. For manual cleanup:

# macOS/Linux
rm ~/.cache/ostruct/upload_cache.sqlite

# Windows
del %LOCALAPPDATA%\ostruct\upload_cache.sqlite

Files being deleted too soon?

Check your TTL configuration:

# Extend TTL to 30 days
export OSTRUCT_CACHE_MAX_AGE_DAYS=30

# Or disable cleanup entirely
export OSTRUCT_PRESERVE_CACHED_FILES=false

File changed but ostruct uses old version?

The cache detects file changes via size and modification time. If a file genuinely changed, it will be re-uploaded automatically.

Disable cache temporarily:

ostruct run template.j2 schema.json --no-cache-uploads

Technical Details

  • Hash Algorithm: SHA-256 by default (configurable)

  • Database: SQLite with WAL mode for concurrency

  • File Validation: Size and mtime checking to detect changes

  • TTL Management: Automatic cleanup based on file age (14-day default)

  • LRU Behavior: Last-accessed timestamps for intelligent cleanup

  • Error Handling: Graceful degradation when cache unavailable

  • 404 Recovery: Automatic cache cleanup when files are manually deleted

Security Considerations

  • Cache files are stored with user-only permissions on Unix systems

  • File content hashes are computed locally, not sent to OpenAI

  • Cache database contains only hashes and OpenAI file IDs, not file content

  • No sensitive data is stored in the cache beyond what’s already sent to OpenAI

Integration with Tools

The upload cache works seamlessly with all ostruct tools:

Code Interpreter:

# First run uploads
ostruct run analysis.j2 schema.json --file ci:data dataset.csv

# Second run reuses cached file
ostruct run different.j2 schema.json --file ci:analysis dataset.csv

File Search:

# Cache works across different vector stores
ostruct run search1.j2 schema.json --file fs:docs manual.pdf
ostruct run search2.j2 schema.json --file fs:knowledge manual.pdf

Multi-tool routing:

# Upload once, use in both tools
ostruct run multi.j2 schema.json --file ci,fs:shared data.json

Error Handling

The upload cache system includes comprehensive error handling that validates files before cache operations.

File Validation Before Caching

All file operations validate existence and accessibility before interacting with the cache:

  • File existence: Validates that files exist before attempting upload or cache lookup

  • Permission checks: Ensures files are readable before processing

  • Symlink validation: Checks that symlinks point to valid targets

  • Directory validation: Confirms directories exist and are accessible

Error Behavior:

  • File validation errors use exit code 9 (FILE_ERROR)

  • Validation happens before cache lookup to prevent inconsistent state

  • Cache remains clean - invalid files never create cache entries

Example Error Cases:

# File validation prevents cache corruption
ostruct files upload --file missing.txt
# Error: File not found: missing.txt (Exit code 9)
# Cache remains unchanged

# Broken symlinks are detected early
ostruct files upload --file broken_link.txt
# Error: Broken symlink: broken_link.txt -> /nonexistent/target
# No cache entry created

Cache Consistency

The validation-first approach ensures cache consistency:

  • No orphaned entries: Invalid files never create cache records

  • Reliable lookups: Cache hits always correspond to accessible files

  • Clean state: Failed operations don’t leave partial cache data

Integration with File Commands:

The ostruct files commands use the same validation logic as ostruct run, ensuring consistent behavior across all file operations.