Giles Ecosystem Home


The Giles Ecosystem is a distributed system to extract images and texts from PDFs and to run OCR on images and PDFs. It can be easily scaled to accommodate higher workloads. The Giles Ecosystem is being developed by the Digital Innovation Group at Arizona State University.

If you are an enduser and just want to use Giles, you should head over to the User Documentation. If you are a developer and are interested in setting up the Giles Ecosystem, check out the Developer Documentation. If you are trying to connect your application to Giles, see the API Documentation.



System Requirements 

Relevant GitHub Repositories

Apps

  • Gilesgiles-eco-giles-web 
    Frontend for upload and retrieval of images. Provides REST interface as well as GUI.
  • Nepomukgiles-eco-nepomuk 
    Storage backend. Retrieves storage requests through Kafka and provides REST interface to retrieve stored files.
  • Cepheusgiles-eco-cepheus 
    PDF image extraction backend. Extracts images from PDFs. Retrieves extraction requests through Kafka and provides REST interface to retrieve extraction results.  

  • Andromeda: giles-eco-andromeda
    PDF text extraction backend. Extracts text from PDFs. Retrieves extraction requests through Kafka and provides REST interface to retrieve extraction results.
  • Cassiopeia: giles-eco-cassiopeia
    Wrapper for Tesseract to run OCR on submitted and extracted images.
  • Freddie: giles-eco-freddie
    Connector for Solr
  • September: giles-eco-september
    Monitoring component for the Giles Ecosystem

Required Plugins

Recent space activity

Space contributors