OST Front-matter Reference Guide
=================================

This reference guide covers the complete YAML front-matter structure for OST files - **Self-executing ostruct prompts**.

What are OST Files?
===================

OST files are **self-executing ostruct prompts** that package everything needed for AI-powered data processing into a single, executable file.

**Regular ostruct workflow:**
- Execute with: ``ostruct run template.j2 schema.json --var name=value --file data.csv``
- Requires separate template file, schema file, and command-line configuration
- User must know the specific variables, file types, and options needed

**OST file workflow:**
- Execute with: ``./my-tool.ost "input text" --format json`` or ``ostruct runx my-tool.ost --help``
- Everything bundled: template + schema + CLI configuration in one file
- Self-documenting with built-in help (``--help`` flag automatically generated)
- Command-line arguments are predefined and validated

**Key Differences:**

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Aspect
     - Regular ostruct templates
     - OST files (Self-executing prompts)
   * - Execution
     - ``ostruct run template.j2 schema.json``
     - ``./tool.ost`` or ``ostruct runx tool.ost``
   * - Schema
     - Separate ``.json`` file required
     - Embedded in YAML front-matter
   * - Configuration
     - Command-line flags and variables
     - Pre-configured CLI with custom arguments
   * - Help
     - Generic ``ostruct run --help``
     - Custom ``./tool.ost --help`` with tool-specific options
   * - Portability
     - Requires multiple files + knowledge
     - Single file, self-contained
   * - User Experience
     - Power users, developers
     - End users, simplified workflows

.. note::
   This guide is designed for non-Python users who need to understand the OST format. No programming knowledge is required.

OST File Structure
==================

OST files have three main sections:

1. **Shebang Line** - Makes the file executable on Unix systems
2. **YAML Front-matter** - Contains CLI metadata and schema (between ``---`` markers)
3. **Template Content** - Jinja2 template that generates the AI prompt

Basic Structure
---------------

.. code-block:: text

   #!/usr/bin/env -S ostruct runx
   ---
   cli:
     name: my-tool
     description: Description of what this tool does
     # ... CLI configuration

   schema: |
     {
       "type": "object",
       "properties": {
         "result": {"type": "string"}
       }
     }

   defaults:
     # ... default values

   global_args:
     # ... global argument policies
   ---
   # Your Jinja2 template content goes here
   Process this input: {{ input_text }}

Required Sections
=================

cli
---

The ``cli`` section defines the command-line interface for your template.

**Required Fields:**

.. code-block:: yaml

   cli:
     name: my-tool-name
     description: Brief description of what the tool does

**Optional Fields:**

.. code-block:: yaml

   cli:
     name: my-tool-name
     description: Brief description of what the tool does
     positional:
       - name: input_text
         help: The text to process
         default: "Hello World"
     options:
       format:
         names: ["--format", "-f"]
         help: Output format
         default: "json"
         choices: ["json", "yaml", "text"]

schema
------

The ``schema`` section contains the JSON schema that defines the structure of the output.

.. code-block:: yaml

   schema: |
     {
       "type": "object",
       "properties": {
         "result": {
           "type": "string",
           "description": "The processed result"
         },
         "format": {
           "type": "string",
           "description": "The output format used"
         }
       },
       "required": ["result", "format"]
     }

.. tip::
   Use the Schema Generator tool to create schemas automatically:

   .. code-block:: bash

      tools/schema-generator/run.sh -o my_schema.json my_template.j2

CLI Configuration
=================

Positional Arguments
--------------------

Define required or optional positional arguments:

.. code-block:: yaml

   cli:
     positional:
       - name: input_text
         help: The text to analyze
         # Optional: default value
         default: "Sample text"
       - name: output_file
         help: Where to save results
         # No default = required argument

Options (Flags)
---------------

Define command-line options with various behaviors:

Basic String Option
~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   cli:
     options:
       format:
         names: ["--format", "-f"]
         help: Output format
         default: "json"
         choices: ["json", "yaml", "text"]

Boolean Flag
~~~~~~~~~~~~

**Method 1: Using action (recommended)**

.. code-block:: yaml

   cli:
     options:
       verbose:
         names: ["--verbose", "-v"]
         help: Enable verbose output
         action: "store_true"  # Creates a boolean flag

**Method 2: Using type**

.. code-block:: yaml

   cli:
     options:
       debug:
         names: ["--debug"]
         help: Enable debug mode
         type: "bool"
         default: false

Repeatable Option
~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   cli:
     options:
       tags:
         names: ["--tag", "-t"]
         help: Add a tag (can be used multiple times)
         action: "append"  # Allows multiple values

File Input Option
~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   cli:
     options:
       config_file:
         names: ["--config"]
         help: Configuration file
         type: "file"
         target: "prompt"  # Template access only

       data_file:
         names: ["--data"]
         help: Data file for analysis
         type: "file"
         target: "ci"  # Code Interpreter

       docs_file:
         names: ["--docs"]
         help: Documentation file
         type: "file"
         target: "fs"  # File Search

Directory Input Option
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   cli:
     options:
       source_dir:
         names: ["--source"]
         help: Source directory
         type: "directory"
         target: "prompt"

Collection Input Option
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   cli:
     options:
       source_files:
         names: ["--files"]
         help: Collection of files matching a pattern
         type: "collection"
         target: "prompt"

       test_files:
         names: ["--tests"]
         help: Test files for analysis
         type: "collection"
         target: "ci"  # Code Interpreter

**What it does:** Collection type accepts glob patterns and collects multiple files that match the pattern. Unlike directory type which includes all files in a directory, collection type gives you precise control over which files are included.

**Usage Examples:**

.. code-block:: bash

   # Collect all Python files
   my-tool --files "**/*.py"

   # Collect specific test files
   my-tool --tests "test_*.py"

   # Multiple patterns (if action: append is used)
   my-tool --files "*.py" --files "*.js"

Action Parameters
=================

The ``action`` parameter controls how command-line arguments are processed:

store (default)
---------------

Stores a single value:

.. code-block:: yaml

   format:
     names: ["--format"]
     action: "store"  # Default - can be omitted
     help: Output format

store_true
----------

Creates a boolean flag that defaults to ``False``:

.. code-block:: yaml

   verbose:
     names: ["--verbose", "-v"]
     action: "store_true"
     help: Enable verbose output

Usage: ``./my_tool.ost --verbose`` sets ``verbose = True``

store_false
-----------

Creates a boolean flag that defaults to ``True``:

.. code-block:: yaml

   no_color:
     names: ["--no-color"]
     action: "store_false"
     help: Disable colored output

Usage: ``./my_tool.ost --no-color`` sets ``no_color = False``

append
------

Allows multiple values for the same option:

.. code-block:: yaml

   tags:
     names: ["--tag", "-t"]
     action: "append"
     help: Add a tag (repeatable)

Usage: ``./my_tool.ost --tag work --tag urgent`` creates ``tags = ["work", "urgent"]``

count
-----

Counts how many times an option is used:

.. code-block:: yaml

   verbosity:
     names: ["--verbose", "-v"]
     action: "count"
     help: Increase verbosity level

Usage: ``./my_tool.ost -vvv`` sets ``verbosity = 3``

CLI-Level Global Arguments
===========================

The ``global_args`` section can be placed either at the top level or within the ``cli`` section. When placed within the CLI section, it provides tool-specific global argument policies:

.. code-block:: yaml

   cli:
     name: my-tool
     description: My custom tool
     global_args:
       model:
         mode: "fixed"
         value: "gpt-4o"
       --temperature:
         mode: "allowed"
         allowed: [0.1, 0.5, 1.0]
       pass_through_global: false

**CLI-Level vs Top-Level global_args:**

- **CLI-level**: Tool-specific policies that override any top-level settings
- **Top-level**: Default policies for the entire OST file
- **Precedence**: CLI-level settings take priority over top-level settings

**When to Use CLI-Level:**

- When you need different policies per tool in complex OST files
- When the tool requires specific model restrictions
- When you want to prevent certain global flags from being used

**Example: Model Restriction**

.. code-block:: yaml

   cli:
     name: secure-analyzer
     description: Security analysis tool
     global_args:
       model:
         mode: "fixed"
         value: "gpt-4o"  # Force secure model
       pass_through_global: false  # Block unknown flags
     options:
       target:
         names: ["--target"]
         help: File to analyze

This ensures the security tool always uses a specific model and prevents users from changing critical settings.

File Routing Targets
====================

The ``target`` parameter controls where files are sent:

prompt (default)
----------------

Files are available in the template but not uploaded to external services:

.. code-block:: yaml

   config_file:
     names: ["--config"]
     type: "file"
     target: "prompt"  # Template access only

Template usage: ``{{ config_file.content }}``

ci (Code Interpreter)
---------------------

Files are uploaded to OpenAI's Code Interpreter for analysis:

.. code-block:: yaml

   data_file:
     names: ["--data"]
     type: "file"
     target: "ci"  # Code Interpreter analysis

The AI can execute Python code to analyze the file.

fs (File Search)
----------------

Files are uploaded to OpenAI's File Search for semantic search:

.. code-block:: yaml

   docs_file:
     names: ["--docs"]
     type: "file"
     target: "fs"  # File Search

The AI can search through the document content.

ud (User Data)
--------------

Files are sent to vision models for analysis:

.. code-block:: yaml

   pdf_file:
     names: ["--pdf"]
     type: "file"
     target: "ud"  # User-data for vision models

Currently supports PDF files for vision analysis.

auto
----

Automatically routes files based on type detection:

.. code-block:: yaml

   auto_file:
     names: ["--auto"]
     type: "file"
     target: "auto"  # Auto-route by file type

Text files go to ``prompt``, binary files to ``ud``.

Validation and Choices
======================

Restrict Input Values
---------------------

Use ``choices`` to limit allowed values:

.. code-block:: yaml

   format:
     names: ["--format", "-f"]
     choices: ["json", "yaml", "text"]
     default: "json"
     help: Output format

Type Validation
---------------

Specify expected data types:

.. code-block:: yaml

   count:
     names: ["--count", "-c"]
     type: "int"
     default: 10
     help: Number of items to process

   threshold:
     names: ["--threshold"]
     type: "float"
     default: 0.5
     help: Threshold value (0.0-1.0)

Required Arguments
------------------

Use ``required`` to make arguments mandatory:

.. code-block:: yaml

   input_file:
     names: ["--input", "-i"]
     help: Input file to process
     type: "file"
     required: true  # User must provide this argument
     target: "prompt"

   api_key:
     names: ["--api-key"]
     help: API key for authentication
     required: true  # No default value, must be provided

**Note**: Arguments with no ``default`` value are automatically required. Use ``required: true`` to explicitly mark optional arguments as mandatory.

Default Values
==============

The ``defaults`` section provides default values for template variables:

.. code-block:: yaml

   defaults:
     format: "json"
     verbose: false
     max_items: 100
     tags: []  # Empty list for append actions

These defaults are used when users don't provide values.

Global Arguments Policy
=======================

The ``global_args`` section controls how users can interact with ostruct's global flags.

Flag Naming Convention
----------------------

**Important**: Global argument keys must use the exact flag format with dashes:

.. code-block:: yaml

   global_args:
     --model:        # Correct: with dashes
       mode: "allowed"
       allowed: ["gpt-4o", "gpt-4o-mini"]

     model:          # INCORRECT: will cause validation errors
       mode: "allowed"

**Rule**: Use the complete flag name including dashes (e.g., ``--model``, ``--temperature``, ``--enable-tool``) exactly as you would type it on the command line.

Policy Configuration
--------------------

.. code-block:: yaml

   global_args:
     pass_through_global: true  # Allow unknown flags

     --model:
       mode: "allowed"
       allowed: ["gpt-4o", "gpt-4.1", "o1"]
       default: "gpt-4.1"

     --temperature:
       mode: "fixed"
       value: "0.7"

     --enable-tool:
       mode: "blocked"

     --verbose:
       mode: "pass-through"

Policy Modes
------------

allowed
~~~~~~~

Restricts users to specific values:

.. code-block:: yaml

   --model:
     mode: "allowed"
     allowed: ["gpt-4o", "gpt-4.1"]
     default: "gpt-4.1"

fixed
~~~~~

Locks a flag to a specific value:

.. code-block:: yaml

   --temperature:
     mode: "fixed"
     value: "0.7"

Users cannot override this value.

blocked
~~~~~~~

Completely prevents users from using a flag:

.. code-block:: yaml

   --enable-tool:
     mode: "blocked"

Any attempt to use this flag will result in an error.

pass-through
~~~~~~~~~~~~

Allows any value (default behavior):

.. code-block:: yaml

   --verbose:
     mode: "pass-through"

Global Flags
============

The ``global_flags`` section provides a list of default global flags that are always passed to ostruct, unless overridden by user input:

.. code-block:: yaml

   global_flags:
     - "--model"
     - "gpt-4o-mini"
     - "--temperature"
     - "0.7"
     - "--progress"
     - "none"

**Format:** A list of strings alternating between flags and their values.

**Usage Notes:**

- Flags are always passed to the underlying ``ostruct run`` command
- User-provided flags with ``allowed`` or ``pass-through`` policies will override these defaults
- Flags with ``fixed`` policies ignore both user input and global_flags defaults
- Use this for setting consistent tool defaults across template invocations

**Example with Policy Interaction:**

.. code-block:: yaml

   global_flags:
     - "--model"
     - "gpt-4o-mini"
     - "--temperature"
     - "0.5"

   global_args:
     --model:
       mode: "allowed"
       allowed: ["gpt-4o", "gpt-4o-mini"]
       # User can override the global_flags default
     --temperature:
       mode: "fixed"
       value: "0.7"
       # Fixed value ignores global_flags default

Complete Example
================

Here's a complete OST template that demonstrates all features:

.. code-block:: text

   #!/usr/bin/env -S ostruct runx
   ---
   cli:
     name: text-analyzer
     description: Analyzes text content and extracts insights

     positional:
       - name: input_text
         help: Text to analyze
         default: "Sample text for analysis"

     options:
       format:
         names: ["--format", "-f"]
         help: Output format
         choices: ["json", "yaml", "text"]
         default: "json"

       verbose:
         names: ["--verbose", "-v"]
         help: Enable verbose output
         action: "store_true"

       max_length:
         names: ["--max-length"]
         help: Maximum text length to process
         type: "int"
         default: 1000

       tags:
         names: ["--tag", "-t"]
         help: Add analysis tags (repeatable)
         action: "append"

       config_file:
         names: ["--config"]
         help: Configuration file
         type: "file"
         target: "prompt"

       data_file:
         names: ["--data"]
         help: Data file for Code Interpreter analysis
         type: "file"
         target: "ci"

   schema: |
     {
       "type": "object",
       "properties": {
         "analysis": {
           "type": "object",
           "properties": {
             "sentiment": {"type": "string"},
             "key_themes": {
               "type": "array",
               "items": {"type": "string"}
             },
             "word_count": {"type": "integer"},
             "tags": {
               "type": "array",
               "items": {"type": "string"}
             }
           },
           "required": ["sentiment", "key_themes", "word_count"]
         },
         "format": {"type": "string"},
         "verbose": {"type": "boolean"}
       },
       "required": ["analysis", "format", "verbose"]
     }

   defaults:
     format: "json"
     verbose: false
     max_length: 1000
     tags: []

   global_args:
     pass_through_global: true

     --model:
       mode: "allowed"
       allowed: ["gpt-4o", "gpt-4.1", "o1"]
       default: "gpt-4.1"

     --temperature:
       mode: "fixed"
       value: "0.7"

     --enable-tool:
       mode: "blocked"
   ---
   # Text Analysis Template

   Analyze the following text and provide insights:

   **Input Text:** {{ input_text }}
   **Format:** {{ format }}
   **Verbose Mode:** {{ verbose }}
   **Max Length:** {{ max_length }}

   {% if tags %}
   **Analysis Tags:** {{ tags | join(", ") }}
   {% endif %}

   {% if config_file is defined %}
   **Configuration:**
   {{ config_file.content }}
   {% endif %}

   {% if data_file is defined %}
   **Data File Available:** {{ data_file.name }}
   {% endif %}

   {% if verbose %}
   Please provide detailed analysis including:
   - Sentiment analysis with confidence scores
   - Key themes with supporting evidence
   - Word count and readability metrics
   - Detailed explanations for each finding
   {% else %}
   Please provide concise analysis including:
   - Overall sentiment
   - Main themes
   - Word count
   {% endif %}

   Return the analysis in the specified format ({{ format }}).

Usage Examples
==============

Once you've created an OST template, you can use it like a native CLI tool:

Basic Usage
-----------

.. code-block:: bash

   # Simple execution
   ./text-analyzer.ost "This is amazing news!"

   # With options
   ./text-analyzer.ost "Analyze this text" --format yaml --verbose

   # With tags
   ./text-analyzer.ost "Sample text" --tag urgent --tag review

   # With files
   ./text-analyzer.ost "Process this" --config settings.yaml --data report.csv

Help and Debugging
------------------

.. code-block:: bash

   # Get help (automatically generated)
   ./text-analyzer.ost --help

   # Dry run to test without API calls
   ostruct runx text-analyzer.ost "test input" --dry-run

   # Debug template rendering
   ostruct runx text-analyzer.ost "test input" --template-debug vars

Cross-Platform Usage
--------------------

.. code-block:: bash

   # Unix/Linux/macOS: Direct execution
   ./text-analyzer.ost "input text"

   # Windows: Via ostruct command
   ostruct runx text-analyzer.ost "input text"

   # All platforms: Via ostruct command
   ostruct runx text-analyzer.ost "input text"

Best Practices
==============

1. **Use Descriptive Names**

   .. code-block:: yaml

      # Good
      input_file:
        names: ["--input-file"]
        help: Input file to process

      # Avoid
      file:
        names: ["--file"]
        help: File

2. **Provide Clear Help Text**

   .. code-block:: yaml

      format:
        names: ["--format", "-f"]
        help: Output format (json, yaml, or text)
        choices: ["json", "yaml", "text"]

3. **Set Sensible Defaults**

   .. code-block:: yaml

      defaults:
        format: "json"
        verbose: false
        max_items: 100

4. **Use Appropriate File Targets**

   .. code-block:: yaml

      # Configuration files → prompt
      config:
        target: "prompt"

      # Data for analysis → ci
      dataset:
        target: "ci"

      # Documents for search → fs
      documentation:
        target: "fs"

5. **Test with Dry Run**

   Always test your templates before live execution:

   .. code-block:: bash

      ostruct runx my-tool.ost "test input" --dry-run

6. **Handle Optional Variables**

   .. code-block:: jinja

      {% if config_file is defined %}
      Configuration: {{ config_file.content }}
      {% endif %}

Common Patterns
===============

Configuration File Pattern
---------------------------

.. code-block:: yaml

   cli:
     options:
       config:
         names: ["--config", "-c"]
         help: Configuration file
         type: "file"
         target: "prompt"
         default: "config.yaml"

Template usage:

.. code-block:: jinja

   {% if config is defined %}
   Configuration settings:
   {{ config.content }}
   {% endif %}

Data Analysis Pattern
---------------------

.. code-block:: yaml

   cli:
     options:
       data:
         names: ["--data", "-d"]
         help: Data file for analysis
         type: "file"
         target: "ci"

       output_dir:
         names: ["--output-dir", "-o"]
         help: Output directory for results
         default: "./results"

Multi-Tool Pattern
-------------------

.. code-block:: yaml

   cli:
     options:
       analysis_data:
         names: ["--data"]
         type: "file"
         target: "ci"  # Code Interpreter

       documentation:
         names: ["--docs"]
         type: "file"
         target: "fs"  # File Search

       config:
         names: ["--config"]
         type: "file"
         target: "prompt"  # Template only

Troubleshooting
===============

Common Issues
-------------

**Template variables not found:**

.. code-block:: jinja

   # Wrong
   {{ my_file }}

   # Correct
   {{ my_file.content }}

**Boolean flags not working:**

.. code-block:: yaml

   # Wrong
   verbose:
     names: ["--verbose"]
     type: "boolean"

   # Correct
   verbose:
     names: ["--verbose"]
     action: "store_true"

**File not accessible:**

Check the target specification:

.. code-block:: yaml

   # For template access
   config:
     target: "prompt"

   # For Code Interpreter
   data:
     target: "ci"

**Schema validation errors:**

Use the Schema Generator tool:

.. code-block:: bash

   tools/schema-generator/run.sh -o schema.json template.ost

Debug Commands
--------------

.. code-block:: bash

   # Show available variables
   ostruct runx my-tool.ost --template-debug vars

   # Show template expansion
   ostruct runx my-tool.ost --template-debug post-expand

   # Dry run with debug
   ostruct runx my-tool.ost "test" --dry-run --verbose

See Also
========

- :doc:`cli_reference` - Complete CLI documentation
- :doc:`template_guide` - Template creation guide
- :doc:`quickstart` - Getting started tutorial
- :doc:`examples` - Practical examples

Validation Rules Reference
==========================

OST frontmatter is validated according to these rules:

Top-Level Fields
----------------

**Allowed fields:** ``cli``, ``schema``, ``defaults``, ``global_args``, ``global_flags``

Any other top-level fields will result in a validation error.

Required Sections
-----------------

**cli section:**
  - Must be a YAML object
  - Must contain ``name`` (non-empty string)
  - Must contain ``description`` (non-empty string)

**schema section:**
  - Must be a non-empty string containing valid JSON schema

CLI Section Validation
----------------------

**positional** (optional):
  - Must be a list of objects
  - Each positional argument must have a ``name`` field (non-empty string)

**options** (optional):
  - Must be a YAML object or list
  - No specific field validation (handled by Click at runtime)

Global Arguments Validation
---------------------------

**global_args location:** Can be at top-level OR inside CLI section

**pass_through_global field:**
  - Must be a boolean value when present
  - Controls whether unknown global flags are allowed

**Policy objects:** All other fields in global_args must be objects with:
  - Required ``mode`` field with value: ``"fixed"``, ``"pass-through"``, ``"allowed"``, or ``"blocked"``
  - Additional fields depend on mode (e.g., ``value`` for fixed, ``allowed`` list for allowed mode)

**global_flags** (optional):
  - Must be a list of strings
  - Strings cannot be empty (will cause validation error)
  - Format: alternating flag names and values (``["--flag", "value", "--other-flag", "other-value"]``)

Error Messages
--------------

**Common validation errors:**

.. code-block:: text

   # Unknown top-level field
   Unknown top-level field 'version'. Allowed fields are: cli, defaults, global_args, global_flags, schema

   # Missing required CLI fields
   'cli' section is missing required field 'name'
   'cli' section is missing required field 'description'

   # Invalid schema
   'schema' must be a non-empty string

   # Invalid global args
   Global arg 'model' config must be an object
   Global arg 'model' must have 'mode' field
   Global arg 'model' mode must be one of: fixed, pass-through, allowed, blocked

Argument Parsing Rules
======================

- **Order matters**: Place flags/options before positional arguments to avoid ambiguity.
- **Example**: my_tool.ost --format json input.txt (good); my_tool.ost input.txt --format json (may fail if --format is unknown to template).
- **Separator**: Use ``--`` to explicitly end flag parsing if needed (e.g., my_tool.ost --format json -- input.txt).

This matches behavior in many CLI tools for predictable parsing.

Argument Parsing Tips
=====================

- **Recommended**: Use `--flag=value` format for flags with values to avoid order issues (e.g., --progress=basic input.txt).
- **Order**: Prefer flags before positionals for best compatibility.
- **Separator**: Use `--` to end flag parsing if needed, but prefer = format for reliability.