Skip to content
/ chew Public

Chew is a Go library for processing various content types into markdown/plaintext.

License

Notifications You must be signed in to change notification settings

mmatongo/chew

Repository files navigation

chew logo

Go Report Card GoDoc Maintainability codecov License

A Go library for processing various content types into markdown/plaintext..

About

Chew is a Go library that processes various content types into markdown or plaintext. It supports multiple content types, including HTML, PDF, CSV, JSON, YAML, DOCX, PPTX, Markdown, Plaintext, MP3, FLAC, and WAVE.

Installation

go get github.com/mmatongo/chew

Usage

Here's a basic example of how to use Chew:

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/mmatongo/chew/v1"
)

func main() {
	urls := []string{
		"https://example.com",
	}

	config := chew.Config{
		UserAgent:       "Chew/1.0 (+https://github.com/mmatongo/chew)",
		RetryLimit:      3,
		RetryDelay:      5 * time.Second,
		CrawlDelay:      10 * time.Second,
		ProxyList:       []string{}, // Add your proxies here, or leave empty
		RateLimit:       2 * time.Second,
		RateBurst:       3,
		IgnoreRobotsTxt: false,
	}

	haChew := chew.New(config)

	// The context is optional, but can be used to cancel the operation after a certain time
	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	chunks, err := haChew.Process(ctx, urls)
	if err != nil {
		if err == context.DeadlineExceeded {
			log.Println("Operation timed out")
		} else {
			log.Printf("Error processing URLs: %v", err)
		}
		return
	}

	for _, chunk := range chunks {
		fmt.Printf("Source: %s\nContent: %s\n\n", chunk.Source, chunk.Content)
	}
}

Output

Source: https://example.com
Content: Example Domain

Source: https://example.com
Content: This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

Source: https://example.com
Content: More information...

You can find more examples in the examples directory as well as instructions on how to use Chew with Ruby and Python.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have any suggestions or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Logo

The logo was made by the amazing MariaLetta.

Similar Projects

docconv

Roadmap

The roadmap for this project is available here. It's meant more as a guide than a strict plan because I only work on this project in my free time.