Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding workflow locks into Wilmer #7

Merged
merged 2 commits into from
Aug 18, 2024
Merged

Conversation

SomeOddCodeGuy
Copy link
Owner

Workflow locks are a way to handle race conditions that could occur in situations where the responding node responds before other writing nodes, such as memories or chat summary. The race condition can only occur when streaming; non-streaming will wait for the entire response before showing the message, and does not make use of the early responses.

Workflow locks will not lock an entire workflow; rather, it locks a workflow at a certain point (wherever you set the workflow lock node). Subsequent messages sent to the workflow will be able to proceed through the workflow up to the point of the lock node. So if the lock is the first node, then the whole workflow is locked; if the lock is the third node, then you can repeatedly hit the first 2 nodes with subsequent workflow calls even if the lock is currently active, but you won't be able to progress past the third node until the lock is released.

Lets look at some examples below. First, lets look at convoroleplaysinglemodel's workflow lock scenario:

[
  {
    "title": "Workflow Lock",
    "type": "WorkflowLock",
    "workflowLockId": "FullCustomChatSummaryLock"
  },
  {
    "title": "Grab the current summary from file",
    "agentName": "Chat Summary File Puller Agent",
    "type": "GetCurrentSummaryFromFile"
  },
  {
    "title": "LLM Responding to User Request",
    "agentName": "Response Agent Five",
    "systemPrompt": "You are an exceptionally creative AI that specializes in user enjoyment, and you are currently engaged in a roleplay conversation with user via an online chat program.\n\nYou are role playing a character in this conversation. Below, within brackets, are the initial instructions for that role play, including the starting scenario:\n[\n{chat_system_prompt}\n]\n\nSince the roleplay began, some changes to the initial scenario may have occurred through natural progression. A summary of the roleplay's story up to now can be found here:\n[\n{agent1Output}\n]\n\nPlease continue the below conversation, acting out your character. Please do not write dialogue for the user's character, and please keep your own response concise so that the user can have an opportunity to respond as well.\n\nPlease continue the below conversation with a concise reply, continuing your character's roleplay.",
    "prompt": "",
    "lastMessagesToSendInsteadOfPrompt": 20,
    "endpointName": "ConvoRoleplaySingleModelEndpoint",
    "preset": "MidnightMiqu1.0_Recommended",
    "maxResponseSizeInTokens": 800,
    "addUserTurnTemplate": false,
    "returnToUser": true
  },
  {
    "title": "Checking AI's recent memory about this topic",
    "agentName": "Chat Summary",
    "type": "FullChatSummary",
    "isManualConfig": false
  }
]

In the above example, a workflow lock will be placed at the start of the workflow. In the event that you are using streaming responses, then when you send your prompt, node 3 ("LLM Responding to User Request") will respond to the user. After the user finishes getting the response, Wilmer will carry on with Node 4, which is building memories and chat summary if necessary. This can take time, but from the user's perspective it may appear that Wilmer is done working and the user could send another message. If they attempt to do so with this workflow lock in place, the prompt that the user sends will be rejected with an error in the console that a workflow lock is in place and they must wait for it to be released.

Note that this is not the best workflow example of why we would want locks. It's helpful to avoid race conditions, but the real benefit of these locks come from the next example.

Now lets consider a two model scenario, which you can find within the template user "convoroleplaytwomodeltemplate"

[
  {
    "title": "Grab the current summary from file",
    "agentName": "Chat Summary File Puller Agent",
    "type": "GetCurrentSummaryFromFile"
  },
  {
    "title": "LLM Responding to User Request",
    "agentName": "Response Agent Five",
    "systemPrompt": "You are an exceptionally creative AI that specializes in user enjoyment, and you are currently engaged in a roleplay conversation with user via an online chat program.\n\nYou are role playing a character in this conversation. Below, within brackets, are the initial instructions for that role play, including the starting scenario:\n[\n{chat_system_prompt}\n]\n\nSince the roleplay began, some changes to the initial scenario may have occurred through natural progression. A summary of the roleplay's story up to now can be found here:\n[\n{agent1Output}\n]\n\nPlease continue the below conversation, acting out your character. Please do not write dialogue for the user's character, and please keep your own response concise so that the user can have an opportunity to respond as well.\n\nPlease continue the below conversation with a concise reply, continuing your character's roleplay.",
    "prompt": "",
    "lastMessagesToSendInsteadOfPrompt": 20,
    "endpointName": "ConvoRoleplayTwoModelResponderEndpoint",
    "preset": "MidnightMiqu1.0_Recommended",
    "maxResponseSizeInTokens": 800,
    "addUserTurnTemplate": false,
    "returnToUser": true
  },
  {
    "title": "Workflow Lock",
    "type": "WorkflowLock",
    "workflowLockId": "FullCustomChatSummaryLock"
  },
  {
    "title": "Checking AI's recent memory about this topic",
    "agentName": "Chat Summary",
    "type": "FullChatSummary",
    "isManualConfig": false
  }
]

In the two model scenario, the workflow lock comes after the response. There's a reason for this distinction.

With my own setup, I utilize more than 1 computer with my Wilmer setup. In this case, assume each endpoint (the responder endpoint and the worker endpoint) that are utilized by convoroleplaytwomodeltemplate are on different computers.

First, the user's prompt is sent to the workflow. The LLM responds in Node 2, which the user sees immediately if they are streaming. After this, a workflow lock is placed and the workflow proceeds to build memories and summary, if applicable.

If the user were to type up a response and send a new prompt, the endpoint call will succeed for Nodes 1 and 2- getting the chat summary from file and responding to the user. Once the LLM has finished responding, if the lock is still active then the workflow will kick back an error that the lock is active.

What this means is that if you have 2 endpoints, your first prompt will respond with one model and then create a lock before the second model starts working on memories and chat summary. The process of creating memories/summaries can take a long time depending on your model/computer, so this could lock up Wilmer for 5-10 minutes in some cases.

But with this workflow lock, your responder model is still free. While the first workflow session that you triggered is busy writing memories and a summary, you can send as many new prompts as you want to the Wilmer endpoint, causing subsequent workflow sessions to freely hit nodes 1 and 2 (which only use the responder model). This means that you will get responses back quickly, regardless of whether memories or a chat summary must be created.

Once the memories/summary are done, the workflow lock will be released and your next prompt may trigger new memories/summaries to begin being processed.

The benefit of this is speed. A user using this workflow no longer need wait for memories/summaries ever again. Responses will always wait only for the responder model to respond the prompt, and memories/summary will simply happen in the background. This results in a massive speedup for Wilmer for people using this. In my own experience, using this workflow with Llama 3.1 70b q8 gguf on an M2 Ultra Mac Studio, I rarely have responses take longer than 60 seconds in total, even for incredibly long conversations. Prior to this change, any response that triggered memories/summary could lock up Wilmer for as long as 5 minutes. With this change, that doesn't happen anymore for me.

The workflow locks are maintained in a new sqllite DB that you can find within the Wilmer directory (it will be created if it doesn't exist). Each time Wilmer is opened, it clears any existing workflow locks in that db; so if you ever have a situation where a lock doesn't clear, you can turn Wilmer off and back on to clear them. Alternatively, locks automatically expire in 10 minutes automatically.

…date documentation; just ran out of time. I've loosely tested this and it works from my testing, but needs much more testing before I move to main
@SomeOddCodeGuy SomeOddCodeGuy merged commit ac6724d into master Aug 18, 2024
@SomeOddCodeGuy SomeOddCodeGuy deleted the feature/Workflow_Locks branch September 8, 2024 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant