Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run spell check against all of the documentation #2318

Closed
simonw opened this issue Apr 10, 2024 · 5 comments
Closed

Run spell check against all of the documentation #2318

simonw opened this issue Apr 10, 2024 · 5 comments

Comments

@simonw
Copy link
Owner

simonw commented Apr 10, 2024

I thought codespell was handling that for me in https://github.com/simonw/datasette/blob/d32176c5b8283fbcc5c8a1f8e6d39339d73013c4/.github/workflows/spellcheck.yml but then I spotted this typo: d32176c

So I'm going to have a go at running everything through Gemini Pro 1.5.

@simonw
Copy link
Owner Author

simonw commented Apr 10, 2024

files-to-prompt docs/*.rst | llm -m pro15 --system \
  'identify and list any possible spelling mistakes, for each one show the file and the surrounding text'

Possible Spelling Mistakes:

File: docs/authentication.rst

  1. "anaynomous":
    • Surrounding Text: "You can specify that only unauthenticated actors (from anaynomous HTTP requests) should be allowed access using the special"
    • Suggested Correction: "anonymous"

File: docs/binary_data.rst

  1. "infulunce":
    • Surrounding Text: "... It lets Datasette know if the current SQL query can be represented by the plugin - and hence infulunce if a link to this output format is displayed in the user interface."
    • Suggested Correction: "influence"

File: docs/changelog.rst

  1. "collides":
    • Surrounding Text: "... setting rules for what should happen if a primary key collides with an existing row."
    • Suggested Correction: "collides" is likely spelled correctly, but consider alternatives like "conflicts" depending on the intended meaning.
  2. "datsette":
    • Surrounding Text: "The **datsette** --get command-line mechanism now ensures any plugins using the startup() hook are correctly executed."
    • Suggested Correction: "datasette"
  3. "frist":
    • Surrounding Text: "Datasette will only return the first 100 rows - the 101st is used purely to detect if there should be another page. Since the where clause acts against the index on the primary key, the query is extremely frist even for records that are a long way into the overall pagination set."
    • Suggested Correction: "fast"

File: docs/configuration.rst

  1. "correct":
    • Surrounding Text: "... Some plugins can be configured at the database or table level. These should use a plugins key nested under the appropriate place within the databases object:\n...\n\n my_other_database:\n tables:\n my_table:\n # plugin configuration for the my_table table inside the my_other_database database\n plugins:\n datasette-my-plugin:\n key: my_value"
    • Suggested Correction: "correct" is likely spelled correctly, but consider alternatives like "relevant" depending on the intended meaning.

File: docs/json_api.rst

  1. "correct":
    • Surrounding Text: "... The "rows" key is a list of objects, each one representing a row. \n\nThe "truncated" key lets you know if the query was truncated."
    • Suggested Correction: "truncated" is likely spelled correctly, but consider alternatives like "limited" depending on the intended meaning.
  2. "correct":
    • Surrounding Text: "... The JSON array option is useful if one of your matching values itself contains a comma:\n\n ?column__in=[""value"",""value,with,commas""]"
    • Suggested Correction: "correct" is likely spelled correctly, but consider alternatives like "provided" or "specified" depending on the intended meaning.
  3. "correct":
    • Surrounding Text: "... you may decide that a specific query can be cached for a longer time, or maybe you need to set ?_ttl=0 for some requests for example if you are running a SQL order by random() query."
    • Suggested Correction: "correct" is likely spelled correctly, but consider alternatives like "appropriate" or "desired" depending on the intended meaning.

File: docs/metadata.rst

  1. "correct":
    • Surrounding Text: "... If your table has more than two columns you can specify which column should be used for the link label with the label_column property:"
    • Suggested Correction: "correct" is likely spelled correctly, but consider alternatives like "intended" or "desired" depending on the intended meaning.

File: docs/testing_plugins.rst

  1. "than":
    • Surrounding Text: "... Datasette will search for this file first in the --template-dir= location, if it was specified - than in the plugin's bundled templates and finally in Datasette's set of default templates."
    • Suggested Correction: "then"
  2. "than":
    • Surrounding Text: "... This method registers any :ref:plugin_hook_startup or :ref:plugin_hook_prepare_jinja2_environment plugins that might themselves need to make async calls.\n\nIf you are using await datasette.client.get() and similar methods than you don't need to worry about this - Datasette automatically calls invoke_startup() the first time it handles a request."
    • Suggested Correction: "then"
  3. "or":
    • Surrounding Text: "If an exception occurs within Datasette itself during a test, the response returned to your plugin will have a response.status_code value of 500.\n\nYou can add pdb=True to the Datasette constructor to drop into a Python debugger session inside your test run instead of getting back a 500 response code. This is equivalent to running the datasette command-line tool with the --pdb option.\n\nHere's what that looks like in a test function:\n\n.. code-block:: python\n\n def test_that_opens_the_debugger_or errors():\n ds = Datasette([db_path], pdb=True)\n response = await ds.client.get("/")"
    • Suggested Correction: "on"

File: docs/writing_plugins.rst

  1. "thang":
    • Surrounding Text: "The exception to this rule is anythang that is documented here. If you find a need for an undocumented utility function in your own work, consider opening an issue <https://github.com/simonw/datasette/issues/new>__ requesting that the function you are using be upgraded to documented and supported status."
    • Suggested Correction: "thing"

@simonw
Copy link
Owner Author

simonw commented Apr 11, 2024

Hah, that's amusing. It hallucinated the incorrect spellings a bit:

  1. "anaynomous":
    • Surrounding Text: "You can specify that only unauthenticated actors (from anaynomous HTTP requests) should be allowed access using the special"

The misspelling is actually anynomous:

You can specify that only unauthenticated actors (from anynomous HTTP requests) should be allowed access using the special ``"unauthenticated": true`` key in an allow block (`allow demo <https://latest.datasette.io/-/allow-debug?actor=null&allow=%7B%0D%0A++++%22unauthenticated%22%3A+true%0D%0A%7D>`__, `deny demo <https://latest.datasette.io/-/allow-debug?actor=%7B%0D%0A++++%22id%22%3A+%22hello%22%0D%0A%7D&allow=%7B%0D%0A++++%22unauthenticated%22%3A+true%0D%0A%7D>`__):

@simonw
Copy link
Owner Author

simonw commented Apr 11, 2024

Hah, yeah Gemini Pro 1.5 (and maybe LLMs in general) Isn't great for spellcheck.

It pointed to "the query is extremely frist even for records that" but the sentence was actually "the query is extremely fast even for records that".

@simonw
Copy link
Owner Author

simonw commented Apr 11, 2024

Also lots of suggestions like this, I can probably fix those with a prompt that says not to do this:

"correct" is likely spelled correctly, but consider alternatives like "appropriate" or "desired" depending on the intended meaning.

@simonw
Copy link
Owner Author

simonw commented Apr 11, 2024

I'm done with this experiment for now.

@simonw simonw closed this as completed in 63714cb Apr 11, 2024
@simonw simonw mentioned this issue Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant