Release/v0.33.0 (#1243)

Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Matt Vallillo <matt@griptape.ai> Co-authored-by: dylanholmes <4370153+dylanholmes@users.noreply.github.com> Co-authored-by: Vasily Vasinov <vasily@griptape.ai> Co-authored-by: CJ Kindel <cjkindel@users.noreply.github.com> Co-authored-by: Emily Danielson <2302515+emjay07@users.noreply.github.com> Co-authored-by: hkhajgiwale <hkhajgiwale@paloaltonetworks.com> Co-authored-by: Harsh Khajgiwale <13365920+hkhajgiwale@users.noreply.github.com> Co-authored-by: Anush <anushshetty90@gmail.com> Co-authored-by: datashaman <marlinf@datashaman.com> Co-authored-by: Zach Giordano <32624672+zachgiordano@users.noreply.github.com> Co-authored-by: Andrew French <andrew@afren.ch> Co-authored-by: Stefano Lottini <stefano.lottini@datastax.com> Co-authored-by: James Clarendon <SavagePencil@users.noreply.github.com> Co-authored-by: Michal <salin87@gmail.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: torabshaikh <torab.shaikh@gmail.com> Co-authored-by: Aodhan Roche <aodhan@griptape.ai> Co-authored-by: Kyle Roche <kyleroche@users.noreply.github.com> Co-authored-by: William Price <82848178+william-price01@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: billytrend-cohere <144115527+billytrend-cohere@users.noreply.github.com>
griptape-ai · Oct 9, 2024 · 91fd268 · 91fd268
1 parent 04fc257
commit 91fd268
Show file tree

Hide file tree

Showing 303 changed files with 3,763 additions and 2,763 deletions.
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
@@ -4,6 +4,7 @@ updates:
     directory: "/"
     schedule:
       interval: "weekly"
+    versioning-strategy: increase-if-necessary
     groups:
       dependencies:
         dependency-type: "production"
@@ -15,6 +16,9 @@ updates:
         update-types:
         - "minor"
         - "patch"
+    allow:
+      - dependency-type: production
+      - dependency-type: development
   - package-ecosystem: "github-actions"
     directory: "/"
     schedule:

diff --git a/.github/workflows/docs-integration-tests.yml b/.github/workflows/docs-integration-tests.yml
@@ -123,6 +123,8 @@ jobs:
       QDRANT_CLUSTER_API_KEY: ${{ secrets.INTEG_QDRANT_CLUSTER_API_KEY }}
       ASTRA_DB_API_ENDPOINT: ${{ secrets.INTEG_ASTRA_DB_API_ENDPOINT }}
       ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.INTEG_ASTRA_DB_APPLICATION_TOKEN }}
+      TAVILY_API_KEY: ${{ secrets.INTEG_TAVILY_API_KEY }}
+      EXA_API_KEY: ${{ secrets.INTEG_EXA_API_KEY }}
     services:
       postgres:
         image: ankane/pgvector:v0.5.0

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,85 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## Unreleased
 
+## [0.33.0] - 2024-10-09
+
+## Added
+- `Workflow.input_tasks` and `Workflow.output_tasks` to access the input and output tasks of a Workflow.
+- Ability to pass nested list of `Tasks` to `Structure.tasks` allowing for more complex declarative Structure definitions.
+- `TavilyWebSearchDriver` to integrate Tavily's web search capabilities.
+- `ExaWebSearchDriver` to integrate Exa's web search capabilities.
+- `Workflow.outputs` to access the outputs of a Workflow.
+- `BaseFileLoader` for Loaders that load from a path.
+- `BaseLoader.fetch()` method for fetching data from a source.
+- `BaseLoader.parse()` method for parsing fetched data.
+- `BaseFileManager.encoding` to specify the encoding when loading and saving files.
+- `BaseWebScraperDriver.extract_page()` method for extracting data from an already scraped web page.
+- `TextLoaderRetrievalRagModule.chunker` for specifying the chunking strategy.
+- `file_utils.get_mime_type` utility for getting the MIME type of a file.
+- `BaseRulesetDriver` for loading a `Ruleset` from an external source.
+  - `LocalRulesetDriver` for loading a `Ruleset` from a local `.json` file.
+  - `GriptapeCloudRulesetDriver` for loading a `Ruleset` resource from Griptape Cloud.
+- Parameter `alias` on `GriptapeCloudConversationMemoryDriver` for fetching a Thread by alias.
+- Basic support for OpenAi Structured Output via `OpenAiChatPromptDriver.response_format` parameter. 
+- Ability to pass callable to `activity.schema` for dynamic schema generation.
+
+### Changed
+- **BREAKING**: Renamed parameters on several classes to `client`:
+  - `bedrock_client` on `AmazonBedrockCohereEmbeddingDriver`.
+  - `bedrock_client` on `AmazonBedrockCohereEmbeddingDriver`.
+  - `bedrock_client` on `AmazonBedrockTitanEmbeddingDriver`.
+  - `bedrock_client` on `AmazonBedrockImageGenerationDriver`.
+  - `bedrock_client` on `AmazonBedrockImageQueryDriver`.
+  - `bedrock_client` on `AmazonBedrockPromptDriver`.
+  - `sagemaker_client` on `AmazonSageMakerJumpstartEmbeddingDriver`.
+  - `sagemaker_client` on `AmazonSageMakerJumpstartPromptDriver`.
+  - `sqs_client` on `AmazonSqsEventListenerDriver`.
+  - `iotdata_client` on `AwsIotCoreEventListenerDriver`.
+  - `s3_client` on `AmazonS3FileManagerDriver`.
+  - `s3_client` on `AwsS3Tool`.
+  - `iam_client` on `AwsIamTool`.
+  - `pusher_client` on `PusherEventListenerDriver`.
+  - `mq` on `MarqoVectorStoreDriver`.
+  - `model_client` on `GooglePromptDriver`.
+  - `model_client` on `GoogleTokenizer`.
+- **BREAKING**: Renamed parameter `pipe` on `HuggingFacePipelinePromptDriver` to `pipeline`.
+- **BREAKING**: Removed `BaseFileManager.default_loader` and `BaseFileManager.loaders`.
+- **BREAKING**: Loaders no longer chunk data, use a Chunker to chunk the data.
+- **BREAKING**: Removed `fileutils.load_file` and `fileutils.load_files`.
+- **BREAKING**: Removed `loaders-dataframe` and `loaders-audio` extras as they are no longer needed.
+- **BREKING**: `TextLoader`, `PdfLoader`, `ImageLoader`, and `AudioLoader` now take a `str | PathLike` instead of `bytes`. Passing `bytes` is still supported but deprecated.
+- **BREAKING**: Removed `DataframeLoader`.
+- **BREAKING**: Update `pypdf` dependency to `^5.0.1`.
+- **BREAKING**: Update `redis` dependency to `^5.1.0`.
+- **BREAKING**: Remove `torch` extra from `transformers` dependency. This must be installed separately.
+- **BREAKING**: Split `BaseExtractionEngine.extract` into `extract_text` and `extract_artifacts` for consistency with `BaseSummaryEngine`.
+- **BREAKING**: `BaseExtractionEngine` no longer catches exceptions and returns `ErrorArtifact`s.
+- **BREAKING**: `JsonExtractionEngine.template_schema` is now required.
+- **BREAKING**: `CsvExtractionEngine.column_names` is now required.
+- **BREAKING**: Renamed`RuleMixin.all_rulesets` to `RuleMixin.rulesets`.
+- **BREAKING**: Renamed `GriptapeCloudKnowledgeBaseVectorStoreDriver` to `GriptapeCloudVectorStoreDriver`.
+- **BREAKING**: `OpenAiChatPromptDriver.response_format` is now a `dict` instead of a `str`.
+- `MarkdownifyWebScraperDriver.DEFAULT_EXCLUDE_TAGS` now includes media/blob-like HTML tags
+- `StructureRunTask` now inherits from `PromptTask`.
+- Several places where API clients are initialized are now lazy loaded.
+- `BaseVectorStoreDriver.upsert_text_artifacts` now returns a list or dictionary of upserted vector ids.
+- `LocalFileManagerDriver.workdir` is now optional.
+- `filetype` is now a core dependency.
+- `FileManagerTool` now uses `filetype` for more accurate file type detection.
+- `BaseFileLoader.load_file()` will now either return a `TextArtifact` or a `BlobArtifact` depending on whether `BaseFileManager.encoding` is set.
+- `Structure.output`'s type is now `BaseArtifact` and raises an exception if the output is `None`.
+- `JsonExtractionEngine.extract_artifacts` now returns a `ListArtifact[JsonArtifact]`.
+- `CsvExtractionEngine.extract_artifacts` now returns a `ListArtifact[CsvRowArtifact]`.
+- Remove `manifest.yml` requirements for custom tool creation.
+
+### Fixed
+- Anthropic native Tool calling.
+- Empty `ActionsSubtask.thought` being logged.
+- `RuleMixin` no longer prevents setting `rulesets` _and_ `rules` at the same time.
+- `PromptTask` will merge in its Structure's Rulesets and Rules.
+- `PromptTask` not checking whether Structure was set before building Prompt Stack.
+- `BaseTask.full_context` context being empty when not connected to a Structure.
+
 ## [0.32.0] - 2024-09-17
 
 ### Added
@@ -22,8 +101,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 - **BREAKING**: Removed `CsvRowArtifact`. Use `TextArtifact` instead.
+- **BREAKING**: Removed `DataframeLoader`.
 - **BREAKING**: Removed `MediaArtifact`, use `ImageArtifact` or `AudioArtifact` instead.
-- **BREAKING**: `CsvLoader`, `DataframeLoader`, and `SqlLoader` now return `list[TextArtifact]`.
+- **BREAKING**: `CsvLoader` and `SqlLoader` now return `ListArtifact[TextArtifact]`.
 - **BREAKING**: Removed `ImageArtifact.media_type`.
 - **BREAKING**: Removed `AudioArtifact.media_type`.
 - **BREAKING**: Removed `BlobArtifact.dir_name`.
@@ -44,6 +124,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - Parameter `meta: dict` on `BaseEvent`.
+- `AzureOpenAiTextToSpeechDriver`.
+- Ability to use Event Listeners as Context Managers for temporarily setting the Event Bus listeners.
+- `JsonSchemaRule` for instructing the LLM to output a JSON object that conforms to a schema.
+- Ability to use Drivers Configs as Context Managers for temporarily setting the default Drivers.
 
 ### Changed
 - **BREAKING**: Drivers, Loaders, and Engines now raise exceptions rather than returning `ErrorArtifacts`.
@@ -52,6 +136,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **BREAKING**: `BaseConversationMemoryDriver.load` now returns `tuple[list[Run], dict]`. This represents the runs and metadata.
 - **BREAKING**: `BaseConversationMemoryDriver.store` now takes `runs: list[Run]` and `metadata: dict` as input.
 - **BREAKING**: Parameter `file_path` on `LocalConversationMemoryDriver` renamed to `persist_file` and is now type `Optional[str]`.
+- **BREAKING**: Removed the `__all__` declaration from the `griptape.mixins` module. 
 - `Defaults.drivers_config.conversation_memory_driver` now defaults to `LocalConversationMemoryDriver` instead of `None`.
 - `CsvRowArtifact.to_text()` now includes the header.
 
@@ -62,6 +147,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Missing `maxTokens` inference parameter in `AmazonBedrockPromptDriver`.
 - Incorrect model in `OpenAiDriverConfig`'s `text_to_speech_driver`.
 - Crash when using `CohereRerankDriver` with `CsvRowArtifact`s.
+- Crash when passing "empty" Artifacts or no Artifacts to `CohereRerankDriver`.
 
 
 ## [0.30.2] - 2024-08-26

diff --git a/MIGRATION.md b/MIGRATION.md
@@ -1,6 +1,165 @@
 # Migration Guide
 
 This document provides instructions for migrating your codebase to accommodate breaking changes introduced in new versions of Griptape.
+## 0.32.X to 0.33.X
+
+### Removed `DataframeLoader`
+
+`DataframeLoader` has been removed. Use `CsvLoader.parse` or build `TextArtifact`s from the dataframe instead.
+
+#### Before
+
+```python
+DataframeLoader().load(df)
+```
+
+#### After
+```python
+# Convert the dataframe to csv bytes and parse it
+CsvLoader().parse(bytes(df.to_csv(line_terminator='\r\n', index=False), encoding='utf-8'))
+# Or build TextArtifacts from the dataframe
+[TextArtifact(row) for row in source.to_dict(orient="records")]
+```
+
+### `TextLoader`, `PdfLoader`, `ImageLoader`, and `AudioLoader` now take a `str | PathLike` instead of `bytes`.
+
+#### Before
+```python
+PdfLoader().load(Path("attention.pdf").read_bytes())
+PdfLoader().load_collection([Path("attention.pdf").read_bytes(), Path("CoT.pdf").read_bytes()])
+```
+
+#### After
+```python
+PdfLoader().load("attention.pdf")
+PdfLoader().load_collection([Path("attention.pdf"), "CoT.pdf"])
+```
+
+### Removed `fileutils.load_file` and `fileutils.load_files`
+
+`griptape.utils.file_utils.load_file` and `griptape.utils.file_utils.load_files` have been removed.
+You can now pass the file path directly to the Loader.
+
+#### Before
+
+```python
+PdfLoader().load(load_file("attention.pdf").read_bytes())
+PdfLoader().load_collection(list(load_files(["attention.pdf", "CoT.pdf"]).values()))
+```
+
+```python
+PdfLoader().load("attention.pdf")
+PdfLoader().load_collection(["attention.pdf", "CoT.pdf"])
+```
+
+### Loaders no longer chunk data
+
+Loaders no longer chunk the data after loading it. If you need to chunk the data, use a [Chunker](https://docs.griptape.ai/stable/griptape-framework/data/chunkers/) after loading the data.
+
+#### Before
+
+```python
+chunks = PdfLoader().load("attention.pdf")
+vector_store.upsert_text_artifacts(
+    {
+        "griptape": chunks,
+    }
+)
+```
+
+#### After
+```python
+artifact = PdfLoader().load("attention.pdf")
+chunks = Chunker().chunk(artifact)
+vector_store.upsert_text_artifacts(
+    {
+        "griptape": chunks,
+    }
+)
+```
+
+
+### Removed `torch` extra from `transformers` dependency
+
+The `torch` extra has been removed from the `transformers` dependency. If you require `torch`, install it separately.
+
+#### Before
+```bash
+pip install griptape[drivers-prompt-huggingface-hub]
+```
+
+#### After
+```bash
+pip install griptape[drivers-prompt-huggingface-hub]
+pip install torch
+```
+
+### `CsvLoader`, `DataframeLoader`, and `SqlLoader` return types 
+
+`CsvLoader`, `DataframeLoader`, and `SqlLoader` now return a `list[TextArtifact]` instead of `list[CsvRowArtifact]`.
+
+If you require a dictionary, set a custom `formatter_fn` and then parse the text to a dictionary. 
+
+#### Before
+
+```python
+results = CsvLoader().load(Path("people.csv").read_text())
+
+print(results[0].value) # {"name": "John", "age": 30}
+print(type(results[0].value)) # <class 'dict'>
+```
+
+#### After
+```python
+results = CsvLoader().load(Path("people.csv").read_text())
+
+print(results[0].value) # name: John\nAge: 30
+print(type(results[0].value)) # <class 'str'>
+
+# Customize formatter_fn
+results = CsvLoader(formatter_fn=lambda x: json.dumps(x)).load(Path("people.csv").read_text())
+print(results[0].value) # {"name": "John", "age": 30}
+print(type(results[0].value)) # <class 'str'>
+
+dict_results = [json.loads(result.value) for result in results]
+print(dict_results[0]) # {"name": "John", "age": 30}
+print(type(dict_results[0])) # <class 'dict'>
+```
+
+Renamed `GriptapeCloudKnowledgeBaseVectorStoreDriver` to `GriptapeCloudVectorStoreDriver`.
+
+#### Before
+```python
+from griptape.drivers.griptape_cloud_knowledge_base_vector_store_driver import GriptapeCloudKnowledgeBaseVectorStoreDriver
+
+driver = GriptapeCloudKnowledgeBaseVectorStoreDriver(...)
+```
+
+#### After
+```python
+from griptape.drivers.griptape_cloud_vector_store_driver import GriptapeCloudVectorStoreDriver
+
+driver = GriptapeCloudVectorStoreDriver(...)
+```
+
+### `OpenAiChatPromptDriver.response_format` is now a `dict` instead of a `str`.
+
+`OpenAiChatPromptDriver.response_format` is now structured as the `openai` SDK accepts it.
+
+#### Before
+```python
+driver = OpenAiChatPromptDriver(
+    response_format="json_object"
+)
+```
+
+#### After
+```python
+driver = OpenAiChatPromptDriver(
+    response_format={"type": "json_object"}
+)
+```
+
 ## 0.31.X to 0.32.X
 
 ### Removed `MediaArtifact`

diff --git a/docs/examples/src/load_query_and_chat_marqo_1.py b/docs/examples/src/load_query_and_chat_marqo_1.py
@@ -1,6 +1,7 @@
 import os
 
 from griptape import utils
+from griptape.chunkers import TextChunker
 from griptape.drivers import MarqoVectorStoreDriver, OpenAiEmbeddingDriver
 from griptape.loaders import WebLoader
 from griptape.structures import Agent
@@ -25,11 +26,12 @@
 
 # Load artifacts from the web
 artifacts = WebLoader().load("https://www.griptape.ai")
+chunks = TextChunker().chunk(artifacts)
 
 # Upsert the artifacts into the vector store
 vector_store.upsert_text_artifacts(
     {
-        namespace: artifacts,
+        namespace: chunks,
     }
 )
 

diff --git a/docs/examples/src/query_webpage_1.py b/docs/examples/src/query_webpage_1.py
@@ -1,14 +1,15 @@
 import os
 
+from griptape.chunkers import TextChunker
 from griptape.drivers import LocalVectorStoreDriver, OpenAiEmbeddingDriver
 from griptape.loaders import WebLoader
 
 vector_store = LocalVectorStoreDriver(embedding_driver=OpenAiEmbeddingDriver(api_key=os.environ["OPENAI_API_KEY"]))
 
-artifacts = WebLoader(max_tokens=100).load("https://www.griptape.ai")
+artifacts = WebLoader().load("https://www.griptape.ai")
+chunks = TextChunker().chunk(artifacts)
 
-for a in artifacts:
-    vector_store.upsert_text_artifact(a, namespace="griptape")
+vector_store.upsert_text_artifacts({"griptape": chunks})
 
 results = vector_store.query("creativity", count=3, namespace="griptape")
 

diff --git a/docs/examples/src/query_webpage_astra_db_1.py b/docs/examples/src/query_webpage_astra_db_1.py
@@ -1,5 +1,6 @@
 import os
 
+from griptape.chunkers import TextChunker
 from griptape.drivers import (
     AstraDbVectorStoreDriver,
     OpenAiChatPromptDriver,
@@ -43,9 +44,9 @@
     ),
 )
 
-artifacts = WebLoader(max_tokens=256).load(input_blogpost)
-
-vector_store_driver.upsert_text_artifacts({namespace: artifacts})
+artifacts = WebLoader().load(input_blogpost)
+chunks = TextChunker(max_tokens=256).chunk(artifacts)
+vector_store_driver.upsert_text_artifacts({namespace: chunks})
 
 rag_tool = RagTool(
     description="A DataStax blog post",

diff --git a/docs/examples/src/talk_to_a_pdf_1.py b/docs/examples/src/talk_to_a_pdf_1.py
@@ -1,5 +1,6 @@
 import requests
 
+from griptape.chunkers import TextChunker
 from griptape.drivers import LocalVectorStoreDriver, OpenAiChatPromptDriver, OpenAiEmbeddingDriver
 from griptape.engines.rag import RagEngine
 from griptape.engines.rag.modules import PromptResponseRagModule, VectorStoreRetrievalRagModule
@@ -30,9 +31,10 @@
     rag_engine=engine,
 )
 
-artifacts = PdfLoader().load(response.content)
+artifacts = PdfLoader().parse(response.content)
+chunks = TextChunker().chunk(artifacts)
 
-vector_store.upsert_text_artifacts({namespace: artifacts})
+vector_store.upsert_text_artifacts({namespace: chunks})
 
 agent = Agent(tools=[rag_tool])