Τα Πάντα ῥεῖ

Dataflow Kit is Web scraping open source framework written in Go.

Turn websites data into spreadsheets, JSON or APIs with a simple point-and-click toolkit


Try Dataflow Kit

spider

Advanced Data Extraction.

DFK platform provides easy and comfortable way to extract data from web pages.

Fetched data can be stored in structured form like spreadsheet or database. CSV, Microsoft Excel, JSON or XML output data formats are currently supported.

It can be used in many ways for data mining, data processing or archiving.

  • Point, click and extract.
  • Work on any interactive site
  • Extract data from multiple pages.
  • Scrape details from linked pages.
  • Follow the direction of robots.txt
  • Save results as CSV, JSON, XML

Screenshot

add selectors screenshot

Internals

Dataflow kit is composed of 2 daemons:

  • fetch.d is the daemon that downloads html pages. It sends requests to Splash server. Splash is a javascript rendering service. It is used to retrieve actual data before sending it to parse.d daemon.
  • parse.d is the daemon that extracts data from downloaded web page following the rules described in configuration JSON file. Extracted data are returned in CSV, JSON or XML format.

Open source

Dataflow kit is completely open source and we welcome all contributors who are interested in collaborating.

Whether you want to help with issues, coding features, releasing the project, scripting, tests, benchmarking, documentation, updating samples or share an information about Dataflow kit.

Please star DFK GitHub repository.


Dataflow kit on Github