Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid recency bias in prompt construction #104

Open
AndreasKarasenko opened this issue Jun 18, 2024 · 3 comments
Open

avoid recency bias in prompt construction #104

AndreasKarasenko opened this issue Jun 18, 2024 · 3 comments

Comments

@AndreasKarasenko
Copy link
Contributor

Context
According to this paper ChatGPT (and likely other LLMs) suffer from a recency bias. Whatever class comes last has a higher propability of being selected.
Issue
Currently scikit-llm constructs prompts based on the order of the training data.
Since we are recommended to restrict the training data I would usually do something like this:

df = df.groupby(label_col).apply(lambda x: x.sample(n_samples))
df = df.reset_index(drop=True)

Which returns a sorted dataframe by label_col. Even if sort=False is passed to groupby the instances are still clustered by label.

Question/Solution
Should a method be implemented that randomizes the order of samples in the prompt / training data, or should users take care of that themselves?
The most straightforward way would be to simply add this to sampling:

df = df.sample(frac=1)

Which leaves it up to chance to balance it reasonably.

@OKUA1
Copy link
Collaborator

OKUA1 commented Jun 18, 2024

Hi @AndreasKarasenko,

Yes, the order of the samples introduces some bias. For the regular FewShot this can be easily solved by permuting the training data. It is not that straight-forward in the DynamicFewShot and would require some refactoring.

On the other hand, I am not sure whether it poses such a big problem. The study you provided is from 2021 and hence relatively outdated.

Also, from my personal observations, sometimes even in the ZeroShot setting, the order of the candidate labels is relevant. Therefore, the bias would probably always introduce some bias which can hardly be completely avoided.

@AndreasKarasenko
Copy link
Contributor Author

A forward search yields this paper from 2024 which supports your last point and also points to this paper from 2021/2022. You're probably right, that accounting for all biases might be out of scope. Maybe a best practices section would be appropriate then?

@OKUA1
Copy link
Collaborator

OKUA1 commented Jun 19, 2024

Yes, I agree that it is a good idea to at least mention it somewhere and in the future think about refactoring the code a bit to minimize this bias.

I will keep the issue open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants