Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternating|Tokens and From:To:When #735

Open
HaydenReeve opened this issue Jun 3, 2023 · 39 comments
Open

Alternating|Tokens and From:To:When #735

HaydenReeve opened this issue Jun 3, 2023 · 39 comments

Comments

@HaydenReeve
Copy link

Hi!

I'd love to enquire about the ability to use two really powerful features from other platforms.

  • Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.
  • Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.

Thanks!

@WASasquatch
Copy link
Contributor

WASasquatch commented Jun 4, 2023

Hi!

I'd love to enquire about the ability to use two really powerful features from other platforms.

* Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.

* Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.

Thanks!

  • In regards to the first one, that's achieved with {Human|Duck} like with sd-dynamic-prompts, and disco diffusion in the past.
    • In my WAS Node Suite you can also use <Human|Duck> (though {Human|Duck} also works) with CLIPTextEncode (NSP) and if you have Advanced CLIPTextEncode Node there will be another conditioning node for that as well with same features. This allows you to do reproducible dynamic prompts. A bonus to these nodes as well is you can create variables, like $|Human Entity with Red Eyes|$ and then elsewhere in the prompt you can use $1 to print that same text again. Subsequent variables are accessed according to occurrence, so second would be $2 and so on.
  • I thought to:from:when worked but maybe not? I know [Photorealistic:0.5] works.

@HaydenReeve
Copy link
Author

HaydenReeve commented Jun 11, 2023

@WASasquatch

RE: {Human|Duck}

The documentation in the README.md lists this

You can use {day|night}, for wildcard/dynamic prompts. With this syntax {wild|card|test} will be randomly replaced by either "wild", "card" or "test" by the frontend every time you queue the prompt. To use {} characters in your actual prompt escape them like: \{ or \}.

This is slightly different functionality from what I am referencing here.

This alternates each time a prompt is queued, not each step of the latent diffusion. The functionality I am describing, at each step, would produce a Human Duck hybrid... Thing. The current functionality of {Human|Duck} would generate either a Human or a Duck.

RE: To:From:When, this one most certainly doesn't work as expected.

This prompt does not appear to work.
Man, [Apple:Fire:0.8] produces
2023-06-11_19-12-14_00001_
While this prompt Man, [Apple::0.8] produces
2023-06-11_19-12-37_00001_

I tried [Duck|Man:0.2] and [Duck|Man:0.8]
2023-06-11_19-5-15_00001_
2023-06-11_19-5-32_00001_

Finally, the [Photorealistic:0.5] example also wouldn't work as expected. I tested using an obvious prompt, Neon. Here are Man, , Man, [Neon:0.9], and Man, [Neon:0.1]
2023-06-11_19-16-55_00001_
2023-06-11_19-17-30_00001_
2023-06-11_19-17-46_00001_

While they do appear to have an effect on the image, they don't work as a sequencer or as a blend method.

@WASasquatch
Copy link
Contributor

Oh to do it for a single diffusion run you have to use stop/start steps with multiple samplers. As for second one, that sucks. Though this works similarly for me ([apple:0.5] [fire:0.5]:1.1) where both only occur for half the steps.

@HaydenReeve
Copy link
Author

I assume you're talking about multiple-sampler retaining noise between each? I cannot imagine doing that in an easy fashion. For a 20 step prompt you'd have to have twenty samplers.

I'll have a look at ([apple:0.5] [fire:0.5]:1.1) but to me that reads as two words; apple and fire, decreased by 50% weighting and then multiplied by 10%. That doesn't exactly seem to be how it works from playing around, but I've not found another explanation that seems to work as expected.

Most surprisingly I've noticed a lot of things that seem to work, but when trying to reduce or test the effects the hypothesis falls apart.

The two features of {Cat|Dog} and {Abstract|Photorealistic:0.8} are stuff that A1111 does natively, which I was wondering whether they'd be officially supported at some point by Comfy.

@WASasquatch
Copy link
Contributor

WASasquatch commented Jun 12, 2023

Brackets isn't decreased weight, it's decreased steps as far as I am aware. Lowering weight is with parenthesis and just using low weight. Brackets control it's occurrence in the diffusion. So 0.5 would be 50% of the steps, so 10 steps.

The issue with ComfyUI is we encode text early to do stuff with it. Combine, mix, etc, to them input into a sampler already encoded. A1111 has text that it encodes on the fly at diffusion time. So each diffusion step it could parse the text differently.

@taabata
Copy link

taabata commented Jun 13, 2023

yeah both features are so powerful...if there is a way to implement them in ComfyUI, that would be great.

@HaydenReeve
Copy link
Author

HaydenReeve commented Jun 13, 2023

Brackets isn't decreased weight, it's decreased steps as far as I am aware. Lowering weight is with parenthesis and just using low weight. Brackets control it's occurrence in the diffusion. So 0.5 would be 50% of the steps, so 10 steps.

That doesn't make sense considering the Man, [Neon:0.X] prompt above. the Man, [Neon:0.1] should have 10% of the Neon token but it's virtually identical to the Man, [Neon:0.9] diffusion.

I suppose that equally means it doesn't lower the weighting either.

I imagine, realistically, the tokens do nothing special but because order/spaces/special characters can be so important that the image is altered slightly and it gives the impression of having an effect.

@WASasquatch
Copy link
Contributor

WASasquatch commented Jun 13, 2023

That doesn't make sense considering the Man, [Neon:0.X] prompt above. the Man, [Neon:0.1] should have 10% of the Neon token but it's virtually identical to the Man, [Neon:0.9] diffusion.

To me, the "neon" sign literal portion is being omitted while the colors derived from the initial noise by prompt are the same. Notice int he 0.1 there is less defined "actual" lights, and the signs neon lettering near gone. This is also why I demonstrated weighting the bracketed bit like man, ([neon:0.1]:1.2) or something to taste.

It may not work as I think it does, but from my testing it seems to, but being just arbitrary without actual defined step control it's very random and hard to control from seed to seed / prompt to prompt.

@HaydenReeve
Copy link
Author

HaydenReeve commented Jun 13, 2023

That's a reasonable assertion.

We can test this, I believe, quite easily.

Take the following prompt, Man, Black and White,
2023-06-13_21-27-25_00001_

Now let's add Pink, [Pink:0.9], and [Pink:0.1] to that prompt respectively.
2023-06-13_21-31-4_00001_
2023-06-13_21-31-39_00001_
2023-06-13_21-31-44_00001_

Let's also inspect [Pink:1] and [Pink:1.0]
2023-06-13_21-32-18_00001_
2023-06-13_21-32-46_00001_

At least with the Pink token, I don't believe this assertion holds up. The rather eclectic results are quite interesting—and should [Pink:0.1] be proven to be a token of 10% of the total steps, I'm not sure it functions how you would expect it to function at the very least.

All of these tests were performed with 10 steps, Euler Karras. Which means 10% would be precisely one step.

@HaydenReeve
Copy link
Author

I am wondering if there would be value in a custom node that does prepares the prompts but does not encode them until a later node, decoupling these functionalities from the node itself. That could allow for the encoding to occur on or before the sampler in custom nodes or officially supported.

I do believe I've found myself wishing more than once to be able to easily append tokens to the prompt without combining the encodings at a later stage for prompt morphing during two/three-resampling stages.

@WASasquatch
Copy link
Contributor

I am wondering if there would be value in a custom node that does prepares the prompts but does not encode them until a later node, decoupling these functionalities from the node itself. That could allow for the encoding to occur on or before the sampler in custom nodes or officially supported.

I do believe I've found myself wishing more than once to be able to easily append tokens to the prompt without combining the encodings at a later stage for prompt morphing during two/three-resampling stages.

WAS-NS has text editing nodes to setup prompts before putting through a Text to Conditioning or other Conditioning node. There is also Tokens to save your own custom stuff. The NSP condition nodes under WAS Suite/Conditioning allow you to use <one|two|three> random prompts which will be reproducible by conditioning seed. It also has prompt variable, so you could do stuff like $|__color__|$_lights, $1_sign and it would be parsed to something like red_lights, red_sign.

As far as true to:from:where, I think we need @comfyanonymous to confirm how to do, or if it even exists.

@WASasquatch
Copy link
Contributor

WASasquatch commented Jun 13, 2023

Also here is my test

A man
image

A man, [pink:0.001]
image

A man, pink
image

A man, (pink:-1.0)
image

To me, it still seems using brackets creates a less defined effect. Pink isn't used for specific things but just in the init_noise, which tricks up the resulting image to incorporate it however, but not defined.

In fact it seems using the brackets brings down the whole fidelity of the image, which may be related to skipping?

Another reason I think it works at least similar is because stuff like this is possible, where with regular weighting is harder to control

A man, ([pink_hair:0.5] mixed with [purple_hair:0.5]:1.2)
image

A man, pink_hair mixed with purple_hair
image

When just prompting it, or weighting up one or the other, it seems one color is just more dominant then the other and hard to get a good mix through tons of gens.

@taabata
Copy link

taabata commented Jun 17, 2023

Hi!

I'd love to enquire about the ability to use two really powerful features from other platforms.

  • Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.
  • Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.

Thanks!

Check out my custom node i created.
https://github.com/taabata/Comfy_custom_nodes
Screenshot from 2023-06-17 17-13-32

@SadaleNet
Copy link

I've encountered this exact problem.

And I've just developed a solution for that. The idea is to create a KSamplerAdvanced node for each step. Then use a custom CLIPTextEncodeA1111 node before it that converts A1111-prompt to standard prompt. Then use a textbox to feed the A1111-like prompt to all of the CLIPTextEncodeA1111.

Unlike the solution of @taabata, my solution has the potential to support controlnet. However, my solution is messy and requires a lot of nodes (which can be automatically generated using a script included in my repo). The syntax is slightly different form A1111 tho because I don't want to use the : as the same character is also used for embedding in ComfyUI. My solution also support recursion syntax.

Here's the repo: https://github.com/SadaleNet/CLIPTextEncodeA1111-ComfyUI

image

@ltdrdata
Copy link
Contributor

I've encountered this exact problem.

And I've just developed a solution for that. The idea is to create a KSamplerAdvanced node for each step. Then use a custom CLIPTextEncodeA1111 node before it that converts A1111-prompt to standard prompt. Then use a textbox to feed the A1111-like prompt to all of the CLIPTextEncodeA1111.

Unlike the solution of @taabata, my solution has the potential to support controlnet. However, my solution is messy and requires a lot of nodes (which can be automatically generated using a script included in my repo). The syntax is slightly different form A1111 tho because I don't want to use the : as the same character is also used for embedding in ComfyUI. My solution also support recursion syntax.

Here's the repo: https://github.com/SadaleNet/CLIPTextEncodeA1111-ComfyUI

image

Recently someone implemented this. Try this.
https://github.com/asagi4/comfyui-prompt-control

@coreyryanhanson
Copy link
Contributor

Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.

You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.

@WASasquatch
Copy link
Contributor

Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.

You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.

I agree with this logic. Being able to step... the step... would allow you do this elegantly with ksampler advanced. @comfyanonymous does this seem logical?

@SadaleNet
Copy link

Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.

You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.

I don't think this idea would work. It'd require the latent output and latent input of the two KSamplerAdvanced nodes to connect with each others.

@MoonRide303
Copy link
Contributor

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass.
It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

@ltdrdata
Copy link
Contributor

ltdrdata commented Aug 5, 2023

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

@WASasquatch
Copy link
Contributor

WASasquatch commented Aug 5, 2023

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

Is there an example of how to do this with that? I wasn't getting same sort of results, but I am not exactly sure how to use it, just what seems like how to do it.

@MoonRide303
Copy link
Contributor

MoonRide303 commented Aug 5, 2023

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

The thing is that for more complex prompts and multiple prompts / CLIP encoders setup we'd be quickly flooded with nodes. Sample (and still relatively simple) prompt from A1111:

[dslr photography : oil on canvas painting : 0.1] of a [blue | red] sphere in the city, [dark ink : airbrush : 0.25], dark cyberpunk future, high quality, high resolution
Negative prompt: low quality, low resolution
Steps: 30, Sampler: Euler, CFG scale: 7, Seed: 0, Size: 1024x1024, Model hash: e6bb9ea85b, Model: sd_xl_base_1.0_0.9vae, Clip skip: 2, Score: 7.19, Version: v1.5.1

and the output:
image

It's very easy and fun to make that kind of transitions in A1111, and it works pretty well.

Doing something like that via extra nodes would basically mean that for every unique combination of the prompt we would have to create duplicates of prompt and conditioning nodes.

And imagine doing it with more advanced flows - for example my basic setup for SDXL is 3 positive + 3 negative prompts (one for each text encoder: base G+, base G-, base L+, base L-, refiner+, refiner-). If I wanted to do transitions like in the example above in the ComfyUI, I would have to make few times more nodes just to handle that prompt. And each time I would like to add or remove some transitions in the prompt, I would have to reconfigure whole flow.

The prompt2prompt way looks like much better idea to me, to be honest. If anyone would like to (and/or knows how to) implement it in ComfyUI, here is original implementation of this feature from Doggettx, and here is v2 (might be useful as reference). It would probably work best if it was included in the basic ComfyUI functionality (not as custom nodes).

@coreyryanhanson
Copy link
Contributor

Is there an example of how to do this with that? I wasn't getting same sort of results, but I am not exactly sure how to use it, just what seems like how to do it.

For the from:when, you would set the start and end for both prompts and then pipe them into a Conditioning (Combine)

@taabata
Copy link

taabata commented Aug 6, 2023

The custom node i created allows for token alternating and prompt editing with control net as well.
link:
https://github.com/taabata/Comfy_Syrian_Falcon_Nodes/tree/main

Screenshot from 2023-08-06 09-25-42

@Lex-DRL
Copy link

Lex-DRL commented Aug 11, 2023

I'm late at the party, but +1 for the request.

Now ComfyUI supports, ConditioningSetTimestepRange.

@ltdrdata if I get it right, this node can be used as an alternative to [from:to:when] syntax. But:

  • It still requires us to manually split text prompt into pieces. What if a prompt contains multiple such entries, each using it's own switch point? This can quickly require literal dozens of nodes just for that.
  • As far as I can see, there's still no alternative to [cow|horse] syntax. Which is usually used with multiple entries, too. This prompt: [grey|white|brown] [cow|horse] on a [grass|field|courtyard|lawn|glade] immediately creates 3*2*5= 30 prompt variants. Which currently can be achieved in ComfyUI only with 30 text node copies and an INSANELY intertwined graph.

Worst of all, both solutions make a network prompt-dependent.

So... is it planned to implement an actual equivalent for this syntax?

@ltdrdata
Copy link
Contributor

I'm late at the party, but +1 for the request.

Now ComfyUI supports, ConditioningSetTimestepRange.

@ltdrdata if I get it right, this node can be used as an alternative to [from:to:when] syntax. But:

  • It still requires us to manually split text prompt into pieces. What if a prompt contains multiple such entries, each using it's own switch point? This can quickly require literal dozens of nodes just for that.
  • As far as I can see, there's still no alternative to [cow|horse] syntax. Which is usually used with multiple entries, too. This prompt: [grey|white|brown] [cow|horse] on a [grass|field|courtyard|lawn|glade] immediately creates 3*2*5= 30 prompt variants. Which currently can be achieved only with 30 text node copies and an INSANELY intertwined graph.

Worst of all, both solutions make a network prompt-dependent.

So... is it planned to implement an actual equivalent for this syntax?

Yeah. It seems that we need develope the wrapper.

@Lex-DRL
Copy link

Lex-DRL commented Aug 11, 2023

@ltdrdata
Maybe, you don't need an edge-case wrapper. Maybe, you need an extension to the current data type + an upgrade to the currently present nodes.

Sorry if what I'm going to suggest doesn't make sense to you (if it is, disregard this comment): I'm not sure about the specific python implementation of data flow in ComfyUI. But maybe, instead of a new uber-all-in-one node, what we need is something like conditioning v2 data type (between nodes). Which is treated not as a single data instance, but as an iterator handle of such data.

  • I assume, current conditioning connection passes data through only once, at evaluation start. Unlike it, dependent nodes connected with conditioning v2 would request the data instance at each step.
  • it's the source node's resposibility what it outputs. It may output the same conditioning at each step, but may generate different ones.
  • if a current (legacy) datatype is connected to a node with the newer version input, it's just automatically converted into an infinite iterator of the same thing
  • To let dependent nodes do any work only once, there could be some metadata attached to indicate the number of unique conditioning objects it generates, their IDs, etc.

@tbrebant
Copy link

I'm also late to the party too and I will also +1 this request.
I tried the custom nodes presented in this thread.
Sadly @taabata 's one is not working for me. I got different errors that I solved before hitting one that I did not understand.
@SadaleNet 's one works well on my machine but is not scalable.

Prompt alternating is a great way to achieve some effects that are hard to obtain in a different way.

@comfyanonymous
Copy link
Owner

Conditioning concat or combine should give results that are close to alternating prompts.

@Lex-DRL
Copy link

Lex-DRL commented Aug 17, 2023

The issue is, it makes the graph unmanageable. To do it with conditioning concat, we need to manually split a single prompt into multiple nodes... and the split point usually moves within the prompt, which makes the prompting process unnecessarily overcomplicated.

@SadaleNet
Copy link

By adapting the code of the new ConditioningSetTimestepRange, it should technically be possible to turn my solution (or other similar solutions) into a scalable one. I don't have the time to do that right now. If someone happen to have implemented a working solution, please do share it here.

@jesudornenkrone
Copy link

I know that the focus is more on being able to model complex workflows, but being able to write these powerful prompts that easy in A1111 is something I really miss in ComfiyUI.

@WASasquatch
Copy link
Contributor

I know that the focus is more on being able to model complex workflows, but being able to write these powerful prompts that easy in A1111 is something I really miss in ComfiyUI.

It can be done now but it's down to someone mirroring existing markdown for it, or a novel approach for ComfyUI with some new markdown for prompts. I thought about it but I don't really want to look into ever little possible combo to deal with since it's not something I use so elaborately

@shiimizu
Copy link
Contributor

shiimizu commented Aug 28, 2023

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

The thing is that for more complex prompts and multiple prompts / CLIP encoders setup we'd be quickly flooded with nodes. Sample (and still relatively simple) prompt from A1111:

[dslr photography : oil on canvas painting : 0.1] of a [blue | red] sphere in the city, [dark ink : airbrush : 0.25], dark cyberpunk future, high quality, high resolution
Negative prompt: low quality, low resolution
Steps: 30, Sampler: Euler, CFG scale: 7, Seed: 0, Size: 1024x1024, Model hash: e6bb9ea85b, Model: sd_xl_base_1.0_0.9vae, Clip skip: 2, Score: 7.19, Version: v1.5.1

This is easily done with my node.

ComfyUI_temp_nvziz_00001_

workflow-image

@MoonRide303
Copy link
Contributor

@shiimizu Thx for the info, I will check it out.

@MoonRide303
Copy link
Contributor

@shiimizu I took quick look at the code, and... I see a lot of sources files copied from A1111. Better idea could be utilising recently introduced 'on prompt' events (#765), and having something more aligned with comfy internals.

@camoody1
Copy link

@shiimizu I was scanning through this thread tonight looking for a way to use Automatic1111 type syntax in ComfyUI and saw your post about your custom node. I'm just wondering if you're still supporting it, if it's still working, and if you feel like it's a good solution to the discussion here. I feel like these A1111 options are so powerful and it's just crazy to me that Comfy refuses to acknowledge it is lacking in this respect.

Thank you for your time. :)

@dchatel
Copy link

dchatel commented Jan 31, 2024

Another solution might be to get ClipTextEncode node to output a schedule of conditionings rather than a single conditioning. Since we already have the ConditioningSetTimestepRange node, I suppose it wouldn't be that hard to do.

Then the KSampler would get a schedule of conditionings instead of a single one (per positive/negative prompt).
That way, the KSampler could work with something like:

// INPUT: pos_conds, neg_conds where X_conds = [(cond,start,end),...]
for step... :
   current_conditioning = get_the_correct_conditioning_from_X_conds(...) // where X is pos or neg
   do_the_sampling()

@Wyko
Copy link

Wyko commented Apr 11, 2024

+1 to this. It's a very important feature for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests