Giles Ecosystem Home
The Giles Ecosystem is a distributed system to extract images and texts from PDFs and to run OCR on images and PDFs. It can be easily scaled to accommodate higher workloads. The Giles Ecosystem is being developed by the Digital Innovation Group at Arizona State University.
If you are an enduser and just want to use Giles, you should head over to the User Documentation. If you are a developer and are interested in setting up the Giles Ecosystem, check out the Developer Documentation. If you are trying to connect your application to Giles, see the API Documentation.
System Requirements
- Apache Zookeeper (https://zookeeper.apache.org/)
- Apache Kafka (https://kafka.apache.org/)
- MySQL (or PostgreSQL)
- Digilib (http://digilib.sourceforge.net/)
- Tomcat 8
- Java 8
- Solr (if Freddie is added to the system)
Relevant GitHub Repositories
Apps
- Giles: giles-eco-giles-web
Frontend for upload and retrieval of images. Provides REST interface as well as GUI. - Nepomuk: giles-eco-nepomuk
Storage backend. Retrieves storage requests through Kafka and provides REST interface to retrieve stored files. Cepheus: giles-eco-cepheus
PDF image extraction backend. Extracts images from PDFs. Retrieves extraction requests through Kafka and provides REST interface to retrieve extraction results.- Andromeda: giles-eco-andromeda
PDF text extraction backend. Extracts text from PDFs. Retrieves extraction requests through Kafka and provides REST interface to retrieve extraction results. - Cassiopeia: giles-eco-cassiopeia
Wrapper for Tesseract to run OCR on submitted and extracted images. - Freddie: giles-eco-freddie
Connector for Solr - September: giles-eco-september
Monitoring component for the Giles Ecosystem
Required Plugins
Recent space activity
Space contributors
- Julia Damerow (8 days ago)