Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement/591 Previewing of OpenAI Embeddings #622

Merged
merged 25 commits into from
Dec 12, 2023
Merged

Conversation

faisal-alvi
Copy link
Member

@faisal-alvi faisal-alvi commented Nov 24, 2023

Description of the Change

The Post Classification feature within Language Processing allows for previewing settings changes when utilizing the IBM Watson service provider, but not for OpenAI Embeddings. This PR adds previewing for embeddings.

image

Closes #591

How to test the Change

Visit the embedding settings and confirm the previewing works.

Changelog Entry

Added - Previewing of OpenAI Embeddings

Credits

Props @jeffpaul @dkotter @faisal-alvi

Checklist:

  • I agree to follow this project's Code of Conduct.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests pass.

@faisal-alvi faisal-alvi added the type:enhancement New feature or request. label Nov 24, 2023
@faisal-alvi faisal-alvi added this to the 2.6.0 milestone Nov 24, 2023
@faisal-alvi faisal-alvi self-assigned this Nov 24, 2023
@faisal-alvi
Copy link
Member Author

@dkotter this is ready for your review.

@dkotter dkotter modified the milestones: 2.6.0, 2.5.0 Nov 28, 2023
Copy link
Collaborator

@dkotter dkotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things I'm noticing when testing:

  1. The preview box is overlapping the settings for me. I compared it to the Watson preview box and it seems to be rendering at a larger width, not sure why though:

Preview settings

  1. If the feature isn't turned on the preview box shouldn't show. This is how the Watson preview behaves

@faisal-alvi
Copy link
Member Author

  1. The preview box was not overlapping for me not sure why, but checking again it was overlapping. I saw there was a negative 5rem of margin applied to the preview box (not sure why but) that does not seem okay so I removed that and the box was in the correct position.

Before

image

After

image

  1. feature isn't turned on the preview box shouldn't show

Fixed.

dkotter
dkotter previously approved these changes Nov 29, 2023
@dkotter dkotter requested a review from berkod November 29, 2023 16:46
berkod
berkod previously approved these changes Dec 4, 2023
@jeffpaul jeffpaul mentioned this pull request Dec 6, 2023
22 tasks
@jeffpaul
Copy link
Member

jeffpaul commented Dec 6, 2023

@faisal-alvi @dkotter one item I noticed during UAT today was the Embeddings Preview appears to only show sample recommended terms if they were at or above the threshold set (e.g. 75%). It's possible that the demo site didn't have :ALOT: of sample terms so perhaps this was a sample data issue, but one thing that would be good to check is adding some terms that we know should be below the 75% threshold and then seeing if they appear but under the default 75% threshold in the previewer (e.g. 68%, 54%).

A separate item that I noticed was when Embeddings and Watson were both active was that the Watson Taxonomies were showing in the Embeddings preview even if those taxonomies were not selected in the Embeddings settings. So we should ensure the Embeddings preview only shows the taxonomies that are enabled in the settings there (and probably the same for Watson that only the taxonomies set are what are show in the Watson preview).

… the threshold value. After running the calculation, remove any results that are above our threshold. Use a higher level of precision when converting our decimals to percentages so we can more easily see the differences in each value.
@dkotter dkotter dismissed stale reviews from berkod and themself via ae66799 December 7, 2023 18:59
@dkotter
Copy link
Collaborator

dkotter commented Dec 7, 2023

one item I noticed during UAT today was the Embeddings Preview appears to only show sample recommended terms if they were at or above the threshold set (e.g. 75%). It's possible that the demo site didn't have :ALOT: of sample terms so perhaps this was a sample data issue, but one thing that would be good to check is adding some terms that we know should be below the 75% threshold and then seeing if they appear but under the default 75% threshold in the previewer (e.g. 68%, 54%).

Couple things I noted (and fixed) while testing this today.

  1. The way we were using the new threshold value in our similarity calculation was incorrect. We were passing that value in and using it to set the maximum value that would be returned. What this means is any value that was above the threshold would still be returned but will have it's value set to the threshold value. This is one reason why the same values were showing a lot (like 75%). I've changed this calculation to function more like it used to and added some code that removes any results that are above the threshold
  2. We were trimming down to 2 decimal places after the calculation and then we round to 2 decimal places when rendering. As mentioned before, the scores we get here are all pretty close so we need a higher level of precision to more accurately see the differences between each item. I've increased our calculation from using 2 decimal places to using 10 decimal places (display still only uses 2) and results are more accurately displayed now:

Preview results

  1. The results we display is limited to the Number of terms setting. So in UAT, I had this set to 2 which is why we only ever saw 2 results. I thought about removing this limit during previews but for sites with hundreds or thousands of terms, the UI would not be great

A separate item that I noticed was when Embeddings and Watson were both active was that the Watson Taxonomies were showing in the Embeddings preview even if those taxonomies were not selected in the Embeddings settings. So we should ensure the Embeddings preview only shows the taxonomies that are enabled in the settings there (and probably the same for Watson that only the taxonomies set are what are show in the Watson preview).

I tried replicating this today and was not able to; everything seems to be working as expected. I think this was a consequence of having multiple tabs open and switching between features without fully refreshing each time but if I notice it again, I can look to fix

@dkotter
Copy link
Collaborator

dkotter commented Dec 7, 2023

One other thing I was going to note that we may want to look at (probably in a separate PR at this point) is our threshold value goes from 0 to 100. As you can see in the screenshot above, we see values that are pretty close to each other, often separated by a decimal or two. It may make sense to allow more granular control in our threshold picker so you could choose something like 75.5% or 76.25%, as examples. There may also be other ways to more intuitively handle these very small differences in scores.

@faisal-alvi
Copy link
Member Author

I was testing the changes and found the threshold limit is no longer respected. We can see the assigned terms even if they do not match the min—required threshold.

image

I found a false condition that was causing this; fixing it fixed the issue. bcee113

@faisal-alvi
Copy link
Member Author

one item I noticed during UAT today was the Embeddings Preview appears to only show sample recommended terms if they were at or above the threshold set (e.g. 75%). It's possible that the demo site didn't have :ALOT: of sample terms so perhaps this was a sample data issue, but one thing that would be good to check is adding some terms that we know should be below the 75% threshold and then seeing if they appear but under the default 75% threshold in the previewer (e.g. 68%, 54%).

@jeffpaul the previewer respects the settings, and that is why it was not displaying the terms below 75%. I have updated the logic, so for the previewer, we will not consider the threshold limit setting and display the terms even if they have threshold value below the set limit.


Example screenshot; when 85% is set as category threshold, 80% for the other taxonomies:

before

image

after

image

@faisal-alvi
Copy link
Member Author

@dkotter also since the develop is merged here, I can see the E2E tests failing, not sure the exact cause.

@dkotter
Copy link
Collaborator

dkotter commented Dec 8, 2023

@dkotter also since the develop is merged here, I can see the E2E tests failing, not sure the exact cause.

Yeah, tests started failing a couple days ago on the Set up WP environment step. I did some initial digging and I think the problem is GitHub updated the environment we use (ubuntu-latest) and this is breaking things, though not exactly sure how to fix this yet

@dkotter dkotter merged commit 7e6f0b7 into develop Dec 12, 2023
13 checks passed
@dkotter dkotter deleted the enhancement/591 branch December 12, 2023 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement New feature or request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add previewing to OpenAI Embeddings post classification
4 participants