Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Workflow locks are a way to handle race conditions that could occur in situations where the responding node responds before other writing nodes, such as memories or chat summary. The race condition can only occur when streaming; non-streaming will wait for the entire response before showing the message, and does not make use of the early responses.
Workflow locks will not lock an entire workflow; rather, it locks a workflow at a certain point (wherever you set the workflow lock node). Subsequent messages sent to the workflow will be able to proceed through the workflow up to the point of the lock node. So if the lock is the first node, then the whole workflow is locked; if the lock is the third node, then you can repeatedly hit the first 2 nodes with subsequent workflow calls even if the lock is currently active, but you won't be able to progress past the third node until the lock is released.
Lets look at some examples below. First, lets look at convoroleplaysinglemodel's workflow lock scenario:
In the above example, a workflow lock will be placed at the start of the workflow. In the event that you are using streaming responses, then when you send your prompt, node 3 ("LLM Responding to User Request") will respond to the user. After the user finishes getting the response, Wilmer will carry on with Node 4, which is building memories and chat summary if necessary. This can take time, but from the user's perspective it may appear that Wilmer is done working and the user could send another message. If they attempt to do so with this workflow lock in place, the prompt that the user sends will be rejected with an error in the console that a workflow lock is in place and they must wait for it to be released.
Note that this is not the best workflow example of why we would want locks. It's helpful to avoid race conditions, but the real benefit of these locks come from the next example.
Now lets consider a two model scenario, which you can find within the template user "convoroleplaytwomodeltemplate"
In the two model scenario, the workflow lock comes after the response. There's a reason for this distinction.
With my own setup, I utilize more than 1 computer with my Wilmer setup. In this case, assume each endpoint (the responder endpoint and the worker endpoint) that are utilized by convoroleplaytwomodeltemplate are on different computers.
First, the user's prompt is sent to the workflow. The LLM responds in Node 2, which the user sees immediately if they are streaming. After this, a workflow lock is placed and the workflow proceeds to build memories and summary, if applicable.
If the user were to type up a response and send a new prompt, the endpoint call will succeed for Nodes 1 and 2- getting the chat summary from file and responding to the user. Once the LLM has finished responding, if the lock is still active then the workflow will kick back an error that the lock is active.
What this means is that if you have 2 endpoints, your first prompt will respond with one model and then create a lock before the second model starts working on memories and chat summary. The process of creating memories/summaries can take a long time depending on your model/computer, so this could lock up Wilmer for 5-10 minutes in some cases.
But with this workflow lock, your responder model is still free. While the first workflow session that you triggered is busy writing memories and a summary, you can send as many new prompts as you want to the Wilmer endpoint, causing subsequent workflow sessions to freely hit nodes 1 and 2 (which only use the responder model). This means that you will get responses back quickly, regardless of whether memories or a chat summary must be created.
Once the memories/summary are done, the workflow lock will be released and your next prompt may trigger new memories/summaries to begin being processed.
The benefit of this is speed. A user using this workflow no longer need wait for memories/summaries ever again. Responses will always wait only for the responder model to respond the prompt, and memories/summary will simply happen in the background. This results in a massive speedup for Wilmer for people using this. In my own experience, using this workflow with Llama 3.1 70b q8 gguf on an M2 Ultra Mac Studio, I rarely have responses take longer than 60 seconds in total, even for incredibly long conversations. Prior to this change, any response that triggered memories/summary could lock up Wilmer for as long as 5 minutes. With this change, that doesn't happen anymore for me.
The workflow locks are maintained in a new sqllite DB that you can find within the Wilmer directory (it will be created if it doesn't exist). Each time Wilmer is opened, it clears any existing workflow locks in that db; so if you ever have a situation where a lock doesn't clear, you can turn Wilmer off and back on to clear them. Alternatively, locks automatically expire in 10 minutes automatically.