Upload Cache Guide
==================

ostruct's upload cache eliminates duplicate file uploads across runs, providing significant performance improvements and cost savings.

How It Works
------------

1. When you attach a file, ostruct computes its SHA-256 hash
2. The cache is checked for this hash
3. If found, the existing OpenAI file ID is reused
4. If not found, the file is uploaded and cached

This means identical files are uploaded exactly once, regardless of:

- How many times you run ostruct
- What names you use for the files
- Which tools (CI, FS) you attach them to

Benefits
--------

- **Performance**: Instant reuse of previously uploaded files
- **Cost Savings**: Reduced API usage and bandwidth
- **Reliability**: Fewer uploads mean fewer potential failures

Configuration
-------------

The upload cache is enabled by default. Configure it in ``ostruct.yaml``:

.. code-block:: yaml

   uploads:
     persistent_cache: true      # Enable/disable cache
     preserve_cached_files: true # Enable TTL-based cache cleanup
     cache_max_age_days: 14      # Cache retention period (days)
     cache_path: null            # Use default path
     hash_algorithm: sha256      # Hash algorithm

Cache Cleanup & TTL Management
------------------------------

**TTL Rationale**: Two-week window covers typical sprint cycles while keeping embedded storage fees <$0.02 per 100MB.

ostruct automatically manages cached files using a Time-To-Live (TTL) system:

- **Default TTL**: 14 days (configurable)
- **Smart Cleanup**: Preserves cached files within TTL, deletes expired ones
- **Cost Control**: Prevents unlimited accumulation of storage costs
- **Privacy Compliance**: Supports immediate deletion for sensitive data

**Configuration Options**:

.. code-block:: yaml

   uploads:
     preserve_cached_files: true  # Enable TTL-based cleanup
     cache_max_age_days: 14      # Files older than this are deleted

**Common TTL Strategies**:

- **Development** (3-7 days): Frequent iterations, cost-conscious
- **Production** (14-30 days): Stability and performance focused
- **Compliance** (0 days): Immediate deletion for sensitive data

**Compliance Mode** (immediate deletion):

.. code-block:: yaml

   uploads:
     preserve_cached_files: false  # Disable preservation
     cache_max_age_days: 0        # Force immediate cleanup

Cache Location
--------------

By default, the cache is stored in platform-specific locations:

- **macOS**: ``~/Library/Caches/ostruct/upload_cache.sqlite``
- **Linux**: ``~/.cache/ostruct/upload_cache.sqlite``
- **Windows**: ``%LOCALAPPDATA%\ostruct\upload_cache.sqlite``

Command Line Options
--------------------

.. code-block:: bash

   # Disable cache for this run
   ostruct run template.j2 schema.json --no-cache-uploads

   # Disable cache preservation (force cleanup)
   ostruct run template.j2 schema.json --no-cache-preserve

   # Use custom cache location
   ostruct run template.j2 schema.json --cache-path ~/.my-cache/uploads.db

Environment Variables
---------------------

.. code-block:: bash

   # Disable cache globally
   export OSTRUCT_CACHE_UPLOADS=false

   # Use custom cache path
   export OSTRUCT_CACHE_PATH=/custom/path/cache.db

   # Use different hash algorithm
   export OSTRUCT_CACHE_ALGO=sha1

   # Configure cache cleanup
   export OSTRUCT_PRESERVE_CACHED_FILES=true
   export OSTRUCT_CACHE_MAX_AGE_DAYS=14

Performance Examples
--------------------

**First run** - uploads files:

.. code-block:: bash

   $ ostruct run analysis.j2 schema.json --file ci:data large_dataset.csv
   # Uploads large_dataset.csv (takes time based on file size)

**Subsequent runs** - reuses cached uploads (instant!):

.. code-block:: bash

   $ ostruct run analysis.j2 schema.json --file ci:data large_dataset.csv
   # Reuses cached upload instantly

The cache works across all file attachments:

- Code Interpreter files (``--file ci:``)
- File Search documents (``--file fs:``)
- Multi-tool attachments (``--file ci,fs:``)

Troubleshooting
---------------

**Cache not working?**

1. Check if cache is enabled: ``ostruct run --help | grep cache``
2. Verify cache location has write permissions
3. Use ``--verbose`` to see cache operations

**Need to clear the cache?**

The cache automatically cleans up expired files based on TTL settings. For manual cleanup:

.. code-block:: bash

   # macOS/Linux
   rm ~/.cache/ostruct/upload_cache.sqlite

   # Windows
   del %LOCALAPPDATA%\ostruct\upload_cache.sqlite

**Files being deleted too soon?**

Check your TTL configuration:

.. code-block:: bash

   # Extend TTL to 30 days
   export OSTRUCT_CACHE_MAX_AGE_DAYS=30

   # Or disable cleanup entirely
   export OSTRUCT_PRESERVE_CACHED_FILES=false

**File changed but ostruct uses old version?**

The cache detects file changes via size and modification time. If a file genuinely changed, it will be re-uploaded automatically.

**Disable cache temporarily:**

.. code-block:: bash

   ostruct run template.j2 schema.json --no-cache-uploads

Technical Details
-----------------

- **Hash Algorithm**: SHA-256 by default (configurable)
- **Database**: SQLite with WAL mode for concurrency
- **File Validation**: Size and mtime checking to detect changes
- **TTL Management**: Automatic cleanup based on file age (14-day default)
- **LRU Behavior**: Last-accessed timestamps for intelligent cleanup
- **Error Handling**: Graceful degradation when cache unavailable
- **404 Recovery**: Automatic cache cleanup when files are manually deleted

Security Considerations
-----------------------

- Cache files are stored with user-only permissions on Unix systems
- File content hashes are computed locally, not sent to OpenAI
- Cache database contains only hashes and OpenAI file IDs, not file content
- No sensitive data is stored in the cache beyond what's already sent to OpenAI

Integration with Tools
----------------------

The upload cache works seamlessly with all ostruct tools:

**Code Interpreter**:

.. code-block:: bash

   # First run uploads
   ostruct run analysis.j2 schema.json --file ci:data dataset.csv

   # Second run reuses cached file
   ostruct run different.j2 schema.json --file ci:analysis dataset.csv

**File Search**:

.. code-block:: bash

   # Cache works across different vector stores
   ostruct run search1.j2 schema.json --file fs:docs manual.pdf
   ostruct run search2.j2 schema.json --file fs:knowledge manual.pdf

**Multi-tool routing**:

.. code-block:: bash

   # Upload once, use in both tools
   ostruct run multi.j2 schema.json --file ci,fs:shared data.json

Error Handling
==============

The upload cache system includes comprehensive error handling that validates files before cache operations.

File Validation Before Caching
-------------------------------

All file operations validate existence and accessibility before interacting with the cache:

- **File existence**: Validates that files exist before attempting upload or cache lookup
- **Permission checks**: Ensures files are readable before processing
- **Symlink validation**: Checks that symlinks point to valid targets
- **Directory validation**: Confirms directories exist and are accessible

**Error Behavior:**

- File validation errors use exit code 9 (``FILE_ERROR``)
- Validation happens before cache lookup to prevent inconsistent state
- Cache remains clean - invalid files never create cache entries

**Example Error Cases:**

.. code-block:: bash

   # File validation prevents cache corruption
   ostruct files upload --file missing.txt
   # Error: File not found: missing.txt (Exit code 9)
   # Cache remains unchanged

   # Broken symlinks are detected early
   ostruct files upload --file broken_link.txt
   # Error: Broken symlink: broken_link.txt -> /nonexistent/target
   # No cache entry created

Cache Consistency
-----------------

The validation-first approach ensures cache consistency:

- **No orphaned entries**: Invalid files never create cache records
- **Reliable lookups**: Cache hits always correspond to accessible files
- **Clean state**: Failed operations don't leave partial cache data

**Integration with File Commands:**

The ``ostruct files`` commands use the same validation logic as ``ostruct run``, ensuring consistent behavior across all file operations.