upd: updated readme and fixed setup.py

Klugen · Feb 15, 2024 · c5ec351 · c5ec351
1 parent c50bb55
commit c5ec351
Show file tree

Hide file tree

Showing 4 changed files with 60 additions and 172 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,6 +1,6 @@
-# Contributing to YOSO-ai
+# Contributing to ScrapeGraphAI
 
-Thank you for your interest in contributing to **YOSO-ai**! We welcome contributions from the community to help improve and grow the project. This document outlines the guidelines and steps for contributing.
+Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome contributions from the community to help improve and grow the project. This document outlines the guidelines and steps for contributing.
 
 ## Table of Contents
 

diff --git a/README.md b/README.md
@@ -1,203 +1,82 @@
-# 🤖 Scrapegraph-ai: You Only Scrape Once
+# 🕷️ ScrapeGraphAI: You Only Scrape Once
 
-Scrapegraph-ai is a Python **Open Source** library that uses LLM and Langchain for faster and efficient web scraping. Just say which information you want to extract and the library will do it for you.
+ScrapeGraphAI is a *web scraping* python library based on LangChain which uses LLM and direct graph logic to create scraping pipelines.
+Just say which information you want to extract and the library will do it for you!
 
-Official documentation page: [Scrapegraph-ai.readthedocs.io](https://Scrapegraph-ai.readthedocs.io/en/latest/index.html)
-
-# 🔍 Demo
-
-Try out Scrapegraph-ai in your browser:
-
-[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/VinciGit00/Scrapegraph-ai)
-
-# 🔧 Quick Setup
+<p align="center">
+  <img src="docs/assets/scrapegraphai_logo.png" alt="Scrapegraph-ai Logo" style="width: 50%;">
+</p>
 
-Follow the following steps:
 
-1.
+## 🚀 Quick install
 
 ```bash
-git clone https://github.com/VinciGit00/Scrapegraph-ai.git
+pip install scrapegraphai
 ```
+## 🔍 Demo
 
-2.  (Optional)
-
-```bash
-python -m venv venv
-source ./venv/bin/activate
-```
+Try out ScrapeGraphAI in your browser:
 
-3.
+[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/VinciGit00/Scrapegraph-ai)
 
-```bash
-pip install -r requirements.txt
-# if you want to install it as a library
-pip install .
+## 📖 Documentation
 
-# or if you plan on developing new features it is best to also install the extra dependencies using
+The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
 
-pip install -r requirements-dev.txt
-# if you want to install it as a library
-pip install .[dev]
-```
+## 💻 Usage
 
-4.  Create your personal OpenAI API key from [here](https://platform.openai.com/api-keys)
-5.  (Optional) Create a .env file inside the main and paste the API key
+### Case 1: Extracting information using a prompt
 
-```config
-API_KEY="your openai.com api key"
-```
+You can use the `SmartScraper` class to extract information from a website using a prompt.
 
-6. You are ready to go! 🚀
-7. Try running the examples using:
-
-```bash
-python -m examples.html_scraping
-# or if you are outside of the project folder
-python -m Scrapegraph-ai.examples.html_scraping
-```
-
-# 📖 Examples
+The `SmartScraper` class is a direct graph implementation that uses the most common nodes present in a web scraping pipeline. For more information, please see the [documentation](https://scrapegraph-ai.readthedocs.io/en/latest/).
 
 ```python
-import os
-from dotenv import load_dotenv
-from scrapegraphai import _get_function, send_request
-
-load_dotenv()
-
-def main():
-    # Get OpenAI API key from environment variables
-    openai_key = os.getenv("API_KEY")
-    if not openai_key:
-        print("Error: OpenAI API key not found in environment variables.")
-        return
-
-    # Example values for the request
-    request_settings = [
-        {
-        "title": "title_news",
-        "type": "str",
-        "description": "Give me the name of the news"
-        }
-    ]
-
-    # Choose the desired model and other parameters
-    selected_model = "gpt-3.5-turbo"
-    temperature_value = 0.7
-
-    # Mockup World URL
-    mockup_world_url = "https://sport.sky.it/nba?gr=www"
-
-    # Invoke send_request function
-    result = send_request(openai_key, _get_function(mockup_world_url), request_settings, selected_model, temperature_value, 'cl100k_base')
-
-    # Print or process the result as needed
-    print("Result:", result)
-
-if __name__ == "__main__":
-    main()
-```
-
-### Case 2: Passing your own HTML code
+from scrapegraphai.graphs import SmartScraper
 
-```python
-import os
-from dotenv import load_dotenv
-from scrapegraphai import send_request
-
-load_dotenv()
-
-# Example using a HTML code
-query_info = '''
-        Given this code extract all the information in a json format about the news.
-
-        <article class="c-card__wrapper aem_card_check_wrapper" data-cardindex="0">
-            <div class="c-card__content">
-                <h2 class="c-card__title">Booker show with 52 points, whoever has the most games over 50</h2>
-                <div class="c-card__label-wrapper c-label-wrapper">
-                    <span class="c-label c-label--article-heading">Standings</span>
-                </div>
-                <p class="c-card__abstract">The Suns' No. 1 dominated the match won in New Orleans, scoring 52 points. It's about...</p>
-                <div class="c-card__info">
-                    <time class="c-card__date" datetime="20 gen - 07:54">20 gen - 07:54</time>
-                ...
-                </div>
-            </div>
-            <div class="c-card__img-wrapper">
-                <figure class="o-aspect-ratio o-aspect-ratio--16-10 ">
-                    <img crossorigin="anonymous" class="c-card__img j-lazyload" alt="Partite con 50+ punti: Booker in Top-20" data-srcset="..." sizes="..." loading="lazy" data-src="...">
-                    <noscript>
-                        <img crossorigin="anonymous" class="c-card__img" alt="Partite con 50+ punti: Booker in Top-20" srcset="..." sizes="..." src="...">
-                    </noscript>
-                </figure>
-                <i class="icon icon--media icon--gallery icon--medium icon--c-primary">
-                </i>
-            </div>
-        </article>
-    '''
-def main():
-    # Get OpenAI API key from environment variables
-    openai_key = os.getenv("API_KEY")
-    if not openai_key:
-        print("Error: OpenAI API key not found in environment variables.")
-        return
-
-    # Example values for the request
-    request_settings = [
-        {
-            "title": "title",
-            "type": "str",
-            "description": "Title of the news"
-        }
-    ]
-
-    # Choose the desired model and other parameters
-    selected_model = "gpt-3.5-turbo"
-    temperature_value = 0.7
-
-    # Invoke send_request function
-    result = send_request(openai_key, query_info, request_settings, selected_model, temperature_value, 'cl100k_base')
-
-    # Print or process the result as needed
-    print("Result:", result)
-
-if __name__ == "__main__":
-    main()
-```
+OPENAI_API_KEY = "YOUR_API_KEY"
 
-Note: all the model are available at the following link: [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models), be sure you have enabled that keys
+llm_config = {
+    "api_key": OPENAI_API_KEY,
+    "model_name": "gpt-3.5-turbo",
+}
 
-# Example of output
+smart_scraper = SmartScraper("List me all the titles and project descriptions",
+                             "https://perinim.github.io/projects/", llm_config)
 
-Given the following input
+answer = smart_scraper.run()
+print(answer)
+```
 
-```python
-    [
-        {
-            "title": "title",
-            "type": "str",
-            "description": "Title of the news"
-        }
-    ]
+The output will be a dictionary with the extracted information, for example:
 
+```bash
+{
+    'titles': [
+        'Rotary Pendulum RL'
+        ],
+    'descriptions': [
+        'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'
+        ]
+}
 ```
 
-using as a input the website [https://sport.sky.it/nba?gr=www](https://sport.sky.it/nba?gr=www)
+## 🤝 Contributing
+
+Contributions are welcome! Please check out the todos below, and feel free to open a pull request.
+For more information, please see the [contributing guidelines](CONTRIBUTING.md).
 
-The oputput format is a dict and its the following:
+After installing and activating the virtual environment, please remember to install the library using the "dev" extra parameter to have the extra dependencies for development.
 
 ```bash
-    {
-    'title': 'Booker show with 52 points, whoever has the most games over 50'
-    }
+pip install -e .[dev]
 ```
 
-# Credits
-Thanks to: 
-- [nicolapiazzalunga](https://github.com/nicolapiazzalunga): for inspiring scrapegraphai/convert_to_csv.py and scrapegraphai/convert_to_json.py functions
+## Contributors
 
-# Developed by
+[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors)
+
+## Authors
 
 <p align="center">
   <a href="https://vincigit00.github.io/">
@@ -210,3 +89,12 @@ Thanks to:
     <img src="docs/assets/logo_perinilab.png" alt="PeriniLab Logo" style="width: 30%;">
   </a>
 </p>
+
+## 📜 License
+
+ScrapeGraphAI is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more information.
+
+## Acknowledgements
+
+- We would like to thank all the contributors to the project and the open-source community for their support.
+- ScrapeGraphAI is meant to be used for data exploration and research purposes only. We are not responsible for any misuse of the library.
diff --git a/docs/assets/scrapegraphai_logo.png b/docs/assets/scrapegraphai_logo.png
diff --git a/setup.py b/setup.py
@@ -15,7 +15,7 @@ def load_requirements(filename):
 setup(
     name='scrapegraphai',
     version='0.1.0',  # MAJOR.MINOR.PATCH
-    description='A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines.
+    description='A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines.',
     long_description=open('README.md', encoding='utf-8').read(),
     long_description_content_type='text/markdown',
     url='https://scrapegraph-ai.readthedocs.io/',