The Giles Ecosystem is a distributed system to extract images and texts from PDFs and to run OCR on images and PDFs. It can be easily scaled to accommodate higher workloads. The Giles Ecosystem is being developed by the Digital Innovation Group at Arizona State University.

If you are an enduser and just want to use Giles, you should head over to the User Documentation. If you are a developer and are interested in setting up the Giles Ecosystem, check out the Developer Documentation. If you are trying to connect your application to Giles, see the API Documentation.

System Requirements

Apache Zookeeper (https://zookeeper.apache.org/)
Apache Kafka (https://kafka.apache.org/)
MySQL (or PostgreSQL)
Digilib (http://digilib.sourceforge.net/)
Tomcat 8
Java 8
Solr (if Freddie is added to the system)

Relevant GitHub Repositories

Apps

Giles: giles-eco-giles-web
Frontend for upload and retrieval of images. Provides REST interface as well as GUI.
Nepomuk: giles-eco-nepomuk
Storage backend. Retrieves storage requests through Kafka and provides REST interface to retrieve stored files.
Cepheus: giles-eco-cepheus
PDF image extraction backend. Extracts images from PDFs. Retrieves extraction requests through Kafka and provides REST interface to retrieve extraction results.
Andromeda: giles-eco-andromeda
PDF text extraction backend. Extracts text from PDFs. Retrieves extraction requests through Kafka and provides REST interface to retrieve extraction results.
Cassiopeia: giles-eco-cassiopeia
Wrapper for Tesseract to run OCR on submitted and extracted images.
Freddie: giles-eco-freddie
Connector for Solr
September: giles-eco-september
Monitoring component for the Giles Ecosystem

Required Plugins

Recent space activity

Space contributors