Skip to content

Commit

Permalink
Fixes in Modules code and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ddebowczyk committed Jun 6, 2024
1 parent c3ccd2c commit 8c761d3
Show file tree
Hide file tree
Showing 5 changed files with 73 additions and 12 deletions.
56 changes: 56 additions & 0 deletions docs/modules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Modules

> NOTE: This is a work in progress. The documentation is not complete yet.
Modules are a way to encapsulate structured processing logic and data flows. They are inspired by DSPy and TensorFlow modules.

Instructor comes with a set of built-in modules, which can be used to build more complex processing pipelines. You can also create your own modules, by extending provided classes (`Module` or `DynamicModule`) or implementing interfaces (`CanProcessCall`, `HasPendingExecution`, `HasInputOutputSchema`).

## Module anatomy

Module consist of 3 important parts:

- `__construct()` - constructor containing the setup of the module components and dependencies,
- `signature()` - method returning the signature of the module, which specified expected inputs and resulting output fields,
- `forward()` - method containing the processing logic, which takes the input data and returns the output data, using the module components configured in the constructor.

## Signatures

Signatures are a way to define the expected inputs and resulting outputs of a module. They are also used to validate the input data and to infer the output data.

Signature of the module returned by `signature()` method can be defined in several ways.

- you can just return a string in a form of `input1: type, input2: type -> output1: type1, output2: type2`
- you can return an instance of a class implementing `HasSignature` interface (which has `signature()` method returning the signature string)
- as an instance of `Signature` class

String based signature is the simplest way to define the signature, but it's less flexible and may be harder to maintain, especially in more complex cases.

`SignatureData` base class is more flexible way to define the inputs and outputs of a module, which can be useful in more complex cases.

Extend `SignatureData` class and define the fields using `#[InputField]` and `#[OutputField]` attributes. The fields can have type hints, which are used to validate the input data. Also, `#[InputField]` and `#[OutputField]` attributes can contain instructions for LLM, specifying the inference behavior.

## Calling the module

Initiation of the module with the input data is done via `withArgs()` or `with()` methods.
- `withArgs()` - takes the input data fields as arguments - they have to be [named arguments](https://stitcher.io/blog/php-8-named-arguments)
- `with()` - takes the input data as an object implementing `HasInputOutputData` interface - can be used if the module has class based signature

`withArgs()` and `with()` methods available on any `Module` class take the input data, and create
`PendingExecution` object, which is executed when you access the results via `result()` or `get()` methods.

## Working with results

Results of the calling the module via `with()` or `withArgs()` is an instance of `PendingExecution` object, containing the ways to access module outputs.

`PendingExecution` object offers several methods to access the output data:

- `result()` - returns the raw output of the module as defined by `forward()` method,
- `try()` - returns the output of the module as a `Result` object which is a wrapper around the output data, which can be used to check if the output is valid or if there are any errors before accessing the data,
- `get(string $name)` - returns the value of specified output field,
- `get()` - returns the output data as an array of key-value pairs of field name => field value.

Additionally, `PendingExecution` object offers following methods:

- `errors()` - returns the list of errors that occurred during the execution of the module,
- `hasErrors()` - returns `true` if there have been any errors encountered during execution of the module.
25 changes: 14 additions & 11 deletions examples/02_Advanced/LanguagePrograms2/run.php
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,23 @@
using LLM in a modular way. This addon to Instructor has been inspired by DSPy
library for Python (https://github.com/stanfordnlp/dspy).

Key components of language program:
- Module subclasses - encapsulate processing logic
- Signatures - define input and output for data processed by modules
This example demonstrates multistep processing with LLMs:
- parse text to extract email data from text (sender, subject and content) -> result is an object containing parsed email data
- fix spelling mistakes in the subject and content fields -> result is an object containing fixed email subject and content
- translate subject into specified language -> result is an object containing translated data

NOTE: Other concepts from DSPy (optimizer, compiler, evaluator) have not been implemented yet.
All the steps are packaged into a single, reusable module, which is easy to call via:

Module consists of 3 key parts:
- __construct() - initialization of module, prepare dependencies, setup submodules
- signature() - define input and output for data processed by module
- forward() - processing logic, return output data
```
(new ProcessEmail)->withArgs(
text: $text,
language: $language,
);
```

`ProcessEmail` inherits from a `Module`, which is a base class for Instructor modules. It returns a predefined object containing, in this case, the data from all steps of processing.

`Predict` class is a special module, that uses Instructor's structured processing
capabilities to execute inference on provided inputs and return output in a requested
format.
The outputs and flow can be arbitrarily shaped to the needs of specific use case (within the bounds of how Module & Predict components work).

```php
<?php
Expand Down
File renamed without changes.
File renamed without changes.
4 changes: 3 additions & 1 deletion src/Extras/Module/Signature/Signature.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

namespace Cognesy\Instructor\Extras\Module\Signature;

use Cognesy\Instructor\Extras\Module\Signature\Contracts\HasInputSchema;
use Cognesy\Instructor\Extras\Module\Signature\Contracts\HasOutputSchema;
use Cognesy\Instructor\Extras\Module\Signature\Traits\ConvertsToSignatureString;
use Cognesy\Instructor\Schema\Data\Schema\Schema;


class Signature
class Signature implements HasInputSchema, HasOutputSchema
{
use ConvertsToSignatureString;

Expand Down

0 comments on commit 8c761d3

Please sign in to comment.