Skip to content

kaliy/kafka-connect-rss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka Connect RSS

Kafka Connect RSS and Atom Source Connector.

Build Status codecov Maven Central Known Vulnerabilities

Configuration

Connector supports polling multiple URLs and sending output to a single topic. Sample configuration file can be found in the repository here.

URLs should be percent encoded and separated by space. Tasks will be split evenly, e.g. for 5 URLs and 3 tasks.max there will be 3 tasks created with 2, 2 and 1 URLs each. If tasks.max is higher than provided number of URLs, only the necessary number of tasks will be created with 1 URL each.

Connector has following configuration options:

Name Description Type Default Value Importance
rss.urls RSS or Atom feed URLs string high
topic Topic to write to string high
sleep.seconds Time in seconds that connector will wait until querying feed again int 60 medium

Output

Message has the following schema:

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "struct",
        "fields": [
          {
            "type": "string",
            "optional": true,
            "field": "title"
          },
          {
            "type": "string",
            "optional": false,
            "field": "url"
          }
        ],
        "optional": false,
        "name": "org.kaliy.kafka.rss.Feed",
        "version": 1,
        "field": "feed"
      },
      {
        "type": "string",
        "optional": false,
        "field": "title"
      },
      {
        "type": "string",
        "optional": false,
        "field": "id"
      },
      {
        "type": "string",
        "optional": false,
        "field": "link"
      },
      {
        "type": "string",
        "optional": true,
        "field": "content"
      },
      {
        "type": "string",
        "optional": true,
        "field": "author"
      },
      {
        "type": "string",
        "optional": true,
        "field": "date"
      }
    ],
    "optional": false,
    "name": "org.kaliy.kafka.rss.Item",
    "version": 1
  }
}

Sample message with JSON converter without embedded schema:

{
  "feed": {
    "title": "CNN.com - RSS Channel - App International Edition",
    "url": "http://rss.cnn.com/rss/edition.rss"
  },
  "title": "The 56,000-mile electric car journey",
  "id": "https://www.cnn.com/2019/03/22/motorsport/electric-car-around-the-world-wiebe-wakker-spt-intl/index.html",
  "link": "https://www.cnn.com/2019/03/22/motorsport/electric-car-around-the-world-wiebe-wakker-spt-intl/index.html",
  "content": "For three years and 90,000 kilometers and counting, he's traveled the world powered both by electricity and strangers' kindness.",
  "author": "CNN",
  "date": "2019-03-22T13:34:17Z"
}

Changelog

  • 0.1.0 (2019-03-24): Initial release
  • 0.1.1 (2022-11-24):
    • Address known vulnerabilities by upgrading dependencies (#6)
    • Handle Null Pointer Exception in the confluent control center (#8)
  • 0.1.2 (2022-11-24):
    • Support podcasts, the link is a URL to download a file. (#11)

Development

Some development notes can be found here.

To compile and execute unit and integration tests mvn verify command can be used.