Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor Improvement + DAG bugs fixes #326

Merged
merged 2 commits into from
Jun 18, 2024
Merged

Conversation

davidgxue
Copy link
Contributor

@davidgxue davidgxue commented Jun 17, 2024

Bug Fixes

  • DAG: ask_astro_load_astro_cli_docs failure
df["content"] = df["content"].apply(enforce_max_token_len)
KeyError: 'content'
  • DAG: ask_astro_load_stackoverflow failure
page above 25 requires access token or app key
  • DAG: ask_astro_load_blogs failure
  File "/usr/local/airflow/include/tasks/extract/blogs.py", line 56, in <lambda>
    lambda x: BeautifulSoup(x, "lxml").find(class_="post-card__meta").find(class_="title").get_text()
AttributeError: 'NoneType' object has no attribute 'find'

Astro Blogs formatting has changed

  • Astro Docs ingest DAG
    Have been using outdated url doc.astronomer.io, but astronomer has moved to www.astronomer.io/docs

Minor Improvements

  • Remove ingest of Github issues from ingest sources
    • This has been adding nothing but noise. Most closed issues are bug reports and they have been fixed, retrieving these cause the LLM to think the bug persists
  • Github Registry Docs Reformat
    What Ask Astro had for registry ingest previously does not provide LLM on any insights at all
    • How does the LLM know how to use this anyway?
    • Add operator usage and param type details
      e.g. of what we had before
# Registry
## Provider: astro-sdk-python
Version: 1.8.0
Module: dataframe
Module Description: This decorator will allow users to write python functions while treating SQL
tables as dataframes.
  • Upgrade from Cohere Rerank 2 to Rerank 3
    • Cohere emailed us asking us if we can move to Rerank 3. It's cheaper better and faster.
  • Upgrade from GPT-4 Turbo to GPT-4o
  • System Prompt Changes
    • Better LLM filter as last step to get rid of unhelpful documents
    • Ask to not include URLs that do not explicitly appear in the context
    • Ask LLM to explicit cite sources whenever possible. Overriding LLM stuffing template and function in LangChain to allow DocLink and Document # passed into LLM.

@davidgxue davidgxue changed the title Minor Improvement + DAG bugs fixes [WIP] Minor Improvement + DAG bugs fixes Jun 17, 2024
Copy link

cloudflare-workers-and-pages bot commented Jun 17, 2024

Deploying ask-astro with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0735673
Status: ✅  Deploy successful!
Preview URL: https://42c5e62d.ask-astro.pages.dev
Branch Preview URL: https://maintenance-and-minor-upgrad.ask-astro.pages.dev

View logs

@davidgxue davidgxue changed the title [WIP] Minor Improvement + DAG bugs fixes Minor Improvement + DAG bugs fixes Jun 18, 2024
@davidgxue davidgxue marked this pull request as ready for review June 18, 2024 16:39
@davidgxue davidgxue merged commit eb58b23 into main Jun 18, 2024
8 checks passed
@davidgxue davidgxue deleted the maintenance-and-minor-upgrade branch June 18, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants