/
Giles Ecosystem
Giles Ecosystem
The Giles Ecosystem is a distributed system to run OCR on images and extract images and texts from PDF files.
Components
The core components of the Giles Ecosystem are located in the following repositories:
- Giles: https://github.com/diging/giles-eco-giles-web (this repository)
- Nepomuk: https://github.com/diging/giles-eco-nepomuk (file storage)
- Cepheus: https://github.com/diging/giles-eco-cepheus (image extraction from PDF files)
- Andromemda: https://github.com/diging/giles-eco-andromeda (text extraction from PDF files)
- Cassiopeia: https://github.com/diging/giles-eco-cassiopeia (OCR using Tesseract)
Dependencies
The system depends on the following software:
- Apache Tomcat 8
- Apache Kafka
- Apache Zookeeper
- MySQL (or PostgreSQL)
- Tesseract OCR (https://github.com/tesseract-ocr/)
- Digilib
Documentation
The Giles Ecosystem documentation (in progress) can be found here: Giles Ecosystem Home.
Running the Giles Ecosystem
There is a Docker compose file to run the Giles Ecosystem in several Docker containers: https://github.com/diging/giles-eco-docker
, multiple selections available,
Related content
Giles Ecosystem Home
Giles Ecosystem Home
More like this
Getting Started
Getting Started
More like this
Installation
Installation
More like this
Installation
Installation
More like this
User Documentation
User Documentation
More like this
API v2 Documentation (beta)
API v2 Documentation (beta)
More like this