Page Comparison

Versions Compared

Old Version 1

changes.mady.by.user Julia Damerow

Saved on Jul 19, 2017

compared with

New Version Current

changes.mady.by.user Julia Damerow

Saved on Aug 01, 2017

Key

This line was added.
This line was removed.
Formatting was changed.

The Giles Ecosystem is a distributed system to run OCR on images and extract images and texts from PDF files.

Components

The core components of the Giles Ecosystem are located in the following repositories:

Giles: https://github.com/diging/giles-eco-giles-web (this repository)
Nepomuk: https://github.com/diging/giles-eco-nepomuk (file storage)
Cepheus: https://github.com/diging/giles-eco-cepheus (image extraction from PDF files)
Andromemda: https://github.com/diging/giles-eco-andromeda (text extraction from PDF files)
Cassiopeia: https://github.com/diging/giles-eco-cassiopeia (OCR using Tesseract)

Dependencies

The system depends on the following software:

Apache Tomcat 8
Apache Kafka
Apache Zookeeper
MySQL (or PostgreSQL)
Tesseract OCR (https://github.com/tesseract-ocr/)
Digilib

Documentation

The Giles Ecosystem documentation (in progress) can be found here: Giles Ecosystem Home.

Running the Giles Ecosystem

There is a Docker compose file to run the Giles Ecosystem in several Docker containers: https://github.com/diging/giles-eco-docker