Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish the anatomy of a coding assistant blog post #7002

Merged
merged 39 commits into from
Jun 18, 2024

Conversation

ykdojo
Copy link
Contributor

@ykdojo ykdojo commented Jun 12, 2024

No description provided.

Copy link

netlify bot commented Jun 12, 2024

Deploy Preview for sourcegraph ready!

Name Link
🔨 Latest commit 140c807
🔍 Latest deploy log https://app.netlify.com/sites/sourcegraph/deploys/6670bf67e940f20008f5cb41
😎 Deploy Preview https://deploy-preview-7002--sourcegraph.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
ykdojo and others added 25 commits June 13, 2024 15:51
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Co-authored-by: Beyang Liu <beyang@sourcegraph.com>
Copy link
Contributor

@kukicado kukicado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ykdojo!

content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved

1. **Conversation history:** In a given chat session, we record previous messages as they may contain relevant context for the user's next request.
2. **Code search:** We fetch the most relevant code snippets related to the user's query from the codebase using search, similar to how a human dev might search for these code snippets.
3. **User control:** Users should have the ability to mention specific files and provide those directly to the model, and also include the option to reference external sources like Slack threads or Notion documents to enrich the context further.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. **User control:** Users should have the ability to mention specific files and provide those directly to the model, and also include the option to reference external sources like Slack threads or Notion documents to enrich the context further.
3. **User choice:** Users should have the ability to mention specific files and provide those directly to the model, and also include the option to reference external sources like Slack threads or Notion documents to enrich the context further.

maybe "choice" instead of "control"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm here, I think "control" fits better. "Choice" makes me think of choosing an LLM model - so maybe more like choosing one thing out of several options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. :)

content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
- Diagnostic information like warnings and errors
- User-specified context from @-mentions

By including diagnostic information, we're able to provide more appropriate code edits to the selected range of code. In the future, we plan to incorporate code graph context here as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to say something along the lines of "With the code editing feature, we put greater emphasis on the users prompt and rely on the above context sources to generate high quality code output."

This section feels a little incomplete and could use a few more details imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

With the code editing feature, we currently put greater emphasis on the users prompt and rely on the above context sources to generate high quality code output.

This section feels a little incomplete and could use a few more details imo.

Definitely. Writing this part helped me realized that there's more we can do here from the product development perspective, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you. Yeah I think the editing/inserting code feature is a sleeping giant that if we invest more into, could be huge.

content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved

![13_autocomplete](https://storage.googleapis.com/sourcegraph-assets/blog/anatomy/13_autocomplete.png)

For this, we look at a few sources of information: the cursor position, the surrounding code, and the code graph (a code graph is a representation of the relationships and structures within a codebase, mapping entities such as classes and methods to show how they are interconnected). We use the current cursor position within the code graph to determine if the user wants a single-line suggestion or a multi-line suggestion. Once we determine that, we add more context by looking through recent files and open tabs. Within those, we find code snippets related to the code you're currently writing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to explain how we use the code graph for autocomplete context. I know this is unclear to many people (both inside and out of Sourcegraph).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That indeed helps. However It's still unclear to me when and if we actually use this. For example, is LSP context actually enabled by default for our customers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "product of products" framing, and how it ties into the different context retrieval strategies. I had a couple high-level thoughts:

  • I left some suggestions for expanding the "Keyword Search" description, to help emphasize our expertise in information retrieval/ search and how we have a nuanced strategy based on code. This applies to both the "query understanding" and ranking steps.
  • I wonder if we should mention our internal evals briefly? Maybe just a short note about how we are "data driven" in making changes, using both offline and online metrics. I almost never hear other companies mention evals in blog posts, and this gives me a bad impression ... like "why do you have such a complicated pipeline? Did you even measure it??"

Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving those comments! Did you see my comment about evals?

I wonder if we should mention our internal evals briefly? Maybe just a short note about how we are "data driven" in making changes, using both offline and online metrics. I almost never hear other companies mention evals in blog posts, and this gives me a bad impression ... like "why do you have such a complicated pipeline? Did you even measure it??"

@ykdojo
Copy link
Contributor Author

ykdojo commented Jun 17, 2024

@jtibshirani Sorry I missed it, but mentioning our internal evals sounds good.

What can we say about it specifically - or where can I find more about it?

jtibshirani
jtibshirani previously approved these changes Jun 17, 2024
Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

content/blogposts/2024/anatomy-of-a-coding-assistant.md Outdated Show resolved Hide resolved
Co-authored-by: Julie Tibshirani <julietibs@apache.org>
@ykdojo ykdojo merged commit 156c834 into main Jun 18, 2024
6 checks passed
@ykdojo ykdojo deleted the blog/anatomy-of-a-coding-assistant branch June 18, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants