Skip to content

OpenHosta Pipeline Architecture and Guarded Types Integration

This document describes the inner workings of the code analysis pipeline in OpenHosta and its integration with the Guarded types system, specifically focusing on how complex recursive types are handled during Large Language Model (LLM) prompt generation and response parsing.

1. Overview of the Execution Flow (emulate())

When a function (often decorated or explicitly calling emulate()) is executed, OpenHosta intercepts the call using stack inspection mechanisms:

  1. Interception (inspection.py): The get_hosta_inspection() function walks up the call stack (the current execution frame) to identify which function called emulate().
  2. Analysis (analizer.py): Python's introspection capabilities (inspect.signature, inspect.getargvalues) read the calling function's code to extract:
    • Its name.
    • Its documentation (__doc__).
    • The arguments passed by the user (names and values).
    • The type annotations (both input and return types).
  3. Creation of the AnalyzedFunction Dataclass: This raw Python information is encapsulated into an object (AnalyzedFunction and AnalyzedArgument), making it easy to manipulate and IDE-friendly.
  4. Initialization of the Inspection Object: A wrapper container holding the analysis, current state, logs, and pipeline configuration.

2. The Role of the Pipeline (Prompt Generation - "Push" Phase)

The Pipeline is responsible for converting the Inspection object (representing the Python code state) into prompts that are understandable by a Large Language Model (LLM).

The encode_function method translates the AnalyzedFunction dataclass into strings (prompts). This is where the TypeResolver comes into play for type definitions.

The TypeResolver in Encoding (Push)

The TypeResolver (OpenHosta.guarded.resolver) takes a standard Python type annotation (e.g., tuple[Action, str] or even deeply recursive types like dict[tuple, dict[MyEnum, list[str]]]) and evaluates it recursively.

  1. Transformation into Guarded Types: For instance, tuple[Action, str] is converted into GuardedTuple[GuardedEnum(Action), GuardedUtf8]. A complex type like dict[str, list[int]] becomes GuardedDict[GuardedUtf8, GuardedList[GuardedInt]].
  2. Extraction of Descriptions: Guarded types are specifically designed to be self-describing and to parse themselves using natural language.
  3. They provide a literal description via the _type_en class attribute (e.g., "a tuple of (a value from Action enum, a string)").
  4. If applicable, they provide a valid structural data schema via _type_json or the underlying class docstrings.
  5. Injection into the Meta-Prompt: The analyzer dynamically calls these decrypted properties to generate an explanatory block within the prompt provided to the system. This guarantees that the AI model understands exactly the constraints, shapes, and meanings expected.

In the meta_prompt.py, variables like {{ function_return_as_python_type }} and {{ function_return_type_name }} rely directly on the string representations generated by these Guarded Types.

3. Processing the LLM Response ("Pull" Phase)

Once the LLM returns text, the retrieval pipeline (pull) must ingest this generated data and cast it into a valid Python object matching the expected return type.

This is the second crucial role of the Guarded types system, making use of its intelligent and forgiving parsing capabilities.

Parsing the Result with type_returned_data

  1. Resolution of Expected Type: The TypeResolver inspects the calling function's return annotation (e.g., AnalyzedFunction.return_type).
  2. Guarded Instantiation: It resolves the expected type into its Guarded class equivalent (e.g., GuardedTuple[GuardedEnum, GuardedUtf8]).
  3. 4-Level Parsing Pipeline (Native -> Heuristic -> Semantic -> Knowledge): The raw text from the LLM is passed to the initializer of the Guarded class (e.g., GuardedTuple("['action_a', 'Explanation...']")). The Guarded type will attempt to clean, parse, and cast this value, handling deeply nested strings if necessary. For complex types like dictionaries of lists, GuardedDict will recursively cast its keys and values using its inner Guarded item types.
  4. Typed Return: The properly casted Python object is returned to the user, complete with associated metadata like the uncertainty level, ensuring robust integration into the rest of the application.