Skip to content

Commit

Permalink
Update Redme and Docker file
Browse files Browse the repository at this point in the history
  • Loading branch information
unclecode committed Jun 29, 2024
1 parent 61ae2de commit 7b0979e
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 17 deletions.
25 changes: 8 additions & 17 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,11 @@ RUN apt-get update && \
software-properties-common && \
rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
pip install --no-cache-dir spacy torch onnxruntime uvicorn && \
python -m spacy download en_core_web_sm
# pip install --no-cache-dir spacy torch torchvision torchaudio onnxruntime uvicorn && \
# Copy the application code
COPY . .

# Install Crawl4AI using the local setup.py (which will use the default installation)
RUN pip install --no-cache-dir .

# Install Google Chrome and ChromeDriver
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
Expand All @@ -33,35 +32,27 @@ RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key
wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip && \
unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/

# Copy the rest of the application code
COPY . .

# Set environment to use Chrome and ChromeDriver properly
ENV CHROME_BIN=/usr/bin/google-chrome \
CHROMEDRIVER=/usr/local/bin/chromedriver \
DISPLAY=:99 \
DBUS_SESSION_BUS_ADDRESS=/dev/null \
PYTHONUNBUFFERED=1

# pip install -e .[all]
RUN pip install --no-cache-dir -e .[all]

# Ensure the PATH environment variable includes the location of the installed packages
ENV PATH /opt/conda/bin:$PATH

# Make port 80 available to the world outside this container
EXPOSE 80

# Download models call cli "crawl4ai-download-models"
RUN crawl4ai-download-models
# RUN crawl4ai-download-models

# Instakk mkdocs
# Install mkdocs
RUN pip install mkdocs mkdocs-terminal

# Call mkdocs to build the documentation
RUN mkdocs build

# Run uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "4"]


CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "4"]
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ result = crawler.run(url="https://www.nbcnews.com/business")
print(result.markdown)
```

## How to install 🛠
```bash
virtualenv venv
source venv/bin/activate
pip install "crawl4ai @ git+https://github.com/unclecode/crawl4ai.git"
```
### Speed-First Design 🚀
Perhaps the most important design principle for this library is speed. We need to ensure it can handle many links and resources in parallel as quickly as possible. By combining this speed with fast LLMs like Groq, the results will be truly amazing.
Expand Down

0 comments on commit 7b0979e

Please sign in to comment.