Installation


There are two ways to get up and running with the Giles Ecosystem: using Docker or installing it directly on your infrastructure. We recommend to use Docker for evaluation and testing purposes, but to install all components directly on your machines in production. 


Using Docker

You can find a docker compose file with set up instructions for the Giles Ecosystem here: giles-eco-docker.


Full Installation

The Docker Ecosystem consists of the following components: Giles, Nepomuk, Cepheus, Cassiopeia, Digilib, Kafka, Zookeeper, MySQL/PostgreSQL. Ideally, these components should all be installed on their own machines to maximize performance, but they can be installed on just one machine (e.g. for development or testing purposes).

Third-party Components

Apache Zookeeper

Please consult the Apache Zookeeper page for installation instructions.

Apache Kafka

Please consult the Apache Kafka page for installation instructions.

Apache Tomcat

All Giles Ecosystem Components need to be deployed in a Servlet Container such as Apache Tomcat. Please consult the Apache Tomcat page for installation instructions. The Giles Ecosystem has been test with Apache Tomcat 8. Apache Tomcat 7 might work as well. 

It is recommended to configure Tomcat to use UTF-8 as character encoding for URLs. To do that follow the instructions here.

MySQL/PostgreSQL

Since version v0.5/v0.6, Giles and Nepomuk require a relational database as backend. Currently the components ship with drivers for MySQL and PostgreSQL. Please consult the respective documentations for installation instructions.

Digilib

Please consult the Digilib page for installation instructions.

Giles Ecosystem Components

Versions

Versioning in the Giles Ecosystem works as follows. Each component has its own version number in the form MAJOR.MINOR.PATCH. Generally, you should always use the latest versions of all components as it is ensured that they will be compatible with each other. Minor version numbers indicate compatibility, which means that Giles v0.5 is ensured to work with Nepomuk v0.5, but might not work with Nepomuk v0.4.5. However, if there is no version v0.5 for Nepomuk, the latest v0.4.X will be compatible.

You might notice that this rule will sometimes lead to odd gaps in version numbers. For example, if Cepheus has only been patched over several releases, but other components had minor or major updates, a major change in a basic part of the system that requires all components to be updated might cause a version jump from v0.1.X to v0.4 in Cepheus, while other components move from v0.3.X to v0.4.

Giles

Giles needs the following software to be installed:

  • MySQL or PostgreSQL
    Create a database for Giles and a user with access to the new database. Then run the script located here to generate the necessary table structure. Note that if you did not call the database you've created for Giles giles, you will have to change the use instruction in the script to reflect the database name.
  • Tomcat 8

You can either build Giles from source by downloading Giles' source code or download the war files uploaded for a release. This page explains how to install Giles using the provided war file. The download page is https://github.com/diging/giles-eco-giles-web/releases. In most cases, you should choose the latest release.

Once downloaded follow these steps:

  1. Unpack the war file (e.g. by changing its ending to ".zip" and unzipping it)
  2. Find the file WEB-INF/classes/config.properties and edit the following properties:
    • giles_files_tmp_dir: This should be an absolute path to the directory where you want Giles to stores its temporary files (files uploaded by users that haven't been processed yet).
    • If your Kafka server is not running on the same machine as Giles, or if it is running on a different port than the default port (9092), you have to change the property 

      kafka_hosts to reflect this.

    • SINCE V0.5 db.driver: the driver appropriate for you database (e.g. com.mysql.jdbc.Driver for MySQL or org.postgresql.Driver for PostgreSQL, see this page for more drivers).
    • SINCE V0.5 db.url: connection URL for the used database .
    • SINCE V0.5 db.username: username to connect to database.
    • SINCE V0.5 db.password: password to connect to database.
    • SINCE V0.5 hibernate.dialect: the dialect used for your database (see this page for a list of dialects).
    • SINCE V0.6 zookeeper_host: host name where Zookeeper is running (e.g. localhost or my.server.org)
    • SINCE V0.6 zookeeper_port: port on which Zookeeper is running (default is 2181)
    • SINCE V0.6 email_enabled: enables or disables email notifications. Set to true if email notifications should be sent; otherwise false.
    • SINCE V0.6 email_from: email address notifications should be sent from. 
    • All other properties can later be changed through the webapp itself.
  3. Find the file WEB-INF/spring/email-config.xml and edit it as follows:

    • Set the host property to your email host name by editing the value attribute:

      <property name="host" value="your.email.host"/>
    • Set the port property to your email host port by editing the port attribute:

      <property name="port" value="email.server.port"/>


    • Set the username property to the username to use to connect to your email server:

      <property name="username" value="username.for.emailserver"/>


    • Set the password property to the password to be used to connect to your email server:

      <property name="password" value="password.for.emailserver"/>


  4. Find the file WEB-INF/classes/user.properties and edit admin password:
    • admin=adminPasswordBCrypted,ROLE_ADMIN,enabled: the password is the first value after the equal sign (adminPasswordBCrypted).  Password should be hashed using the bcrypt algorithm with strength 10.
  5. Find the file WEB-INF/classes/META-INF/persistence.xml and change it as follows:
    • There are three lines that start with <property name="javax.persistence.jdbc.url".  In each line, replace /path/to/giles/dbfiles/folder with the path to the folder that should store Giles' DB files. It should look something like this:

      <property name="javax.persistence.jdbc.url" value="/path/to/giles/db/folder/users.odb"/>

       Make sure to keep the file name at the end of each line.

  6. Find the file WEB-INF/spring/spring-security.xml and change the following lines:

    <beans:bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    	<beans:property name="driverClassName" value="com.mysql.jdbc.Driver" />
    	<beans:property name="url" value="jdbc:mysql://localhost:3306/giles" />
    	<beans:property name="username" value="" />
    	<beans:property name="password" value="" />
    </beans:bean>

    Change the values of username and password to the username of your DB user and its password. If you did not name the new DB giles, change the url property to reflect the database name (e.g. if you named the database gilesdb, then instead of jdbc:mysql://localhost:3306/giles, put jdbc:mysql://localhost:3306/gilesdb).
    If you are using PostgreSQL instead of MySLQ, make sure to replace the driver class name with org.postgresql.Driver.

  7. Now, generate a new war file from the unpacked and changed files and deploy it in your Tomcat. 

    If you are on a Unix-based operating system, you can do this for example by running the command jar -cvf giles.war . from inside the unpacked Giles folder.

  8. Once deployed, Tomcat should be accessible at http://your.server/giles.

From version v0.5 onwards, Giles does no longer use ObjectDB as database backend. If you set up a new Giles installation (not an update with existing data), you probably still have to execute step 5, but your data will be stored in the relational database you specify (e.g. MySQL). Step 5 is still part of the set up process for backwards compatibility and data migration purposes.

Upgrading to v0.5

If you are upgrading from an earlier version to version v0.5, you will have to migrate existing data as follows:

  1. Make sure Giles is running without exceptions with the new version.
  2. Reregister the other components with Giles (Nepomuk, Cassiopeia, Cepheus) under "Apps".
  3. Go to http://your.giles.server/giles-root/admin/migrate
  4. Enter the username of the user whose data you want to migrate. The username will be a combination of username and provider id. For example for GitHub: githubusername_github. Depending on how much data the user has uploaded, this might take a while.
  5. Once the migration is done for a user, you will see some statistics about how many objects were migrated.

Nepomuk

Nepomuk needs the following software to be installed:

  • Tomcat 8
  • SINCE V0.6 Starting with version v0.6, Nepomuk uses a relational database as backend. Currently, Nepomuk ships with drivers for MySQL and PostgreSQL.

You can either build Nepomuk from source by downloading Nepomuk's source code or download the war files uploaded for a release. This page explains how to install Nepomuk using the provided war file. The download page is https://github.com/diging/giles-eco-nepomuk/releases. In most cases, you should choose the latest release.

Once downloaded follow these steps:

  1. Unpack the war file (e.g. by changing its ending to ".zip" and unzipping it)
  2. Find the file WEB-INF/classes/config.properties and edit the following properties:
    • app_base_url: The base URL of Nepomuk such as https://your.nepomuk.server/nepomuk.
    • If your Kafka server is not running on the same machine as Nepomuk, or if it is running on a different port than the default port (9092), you have to change the property 

      kafka_hosts to reflect this.

    • SINCE V0.6 db.driver: the driver appropriate for you database (e.g. com.mysql.jdbc.Driver for MySQL or org.postgresql.Driver for PostgreSQL, see this page for more drivers).
    • SINCE V0.5 db.url: connection URL for the used database .
    • SINCE V0.6 db.username: username to connect to database.
    • SINCE V0.6 db.password: password to connect to database.
    • SINCE V0.6 hibernate.dialect: the dialect used for your database (see this page for a list of dialects).
    • SINCE V0.6 hibernate.show_sql: if you want hibernate to print the SQL statements it runs, set this to true; otherwise false.

    • SINCE V0.6 zookeeper_host: host name where Zookeeper is running (e.g. localhost or my.server.org)
    • SINCE V0.6 zookeeper_port: port on which Zookeeper is running (default is 2181)
  3. Find the file WEB-INF/classes/user.properties and edit admin password:
    • admin=adminPassword,ROLE_ADMIN,enabled: the password is the first value after the equal sign (adminPassword). Password should be hashed using the bcrypt algorithm with strength 10.
  4. Find the file WEB-INF/spring/root-contextx.xml and change the property baseDirectory of following bean definitions:

    <bean id="imageStorageManager"
    	class="edu.asu.diging.gilesecosystem.nepomuk.core.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/image/parent/folder/" />
    	<property name="fileTypeFolder" value="images"></property>
    </bean>
    
    <bean id="pdfStorageManager"
    	class="edu.asu.diging.gilesecosystem.nepomuk.core.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/files/parent/folder/" />
    	<property name="fileTypeFolder" value="pdfs"></property>
    </bean>
    
    <bean id="textStorageManager"
    	class="edu.asu.diging.gilesecosystem.nepomuk.core.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/files/parent/folder/" />
    	<property name="fileTypeFolder" value="texts"></property>
    </bean>
    
    <bean id="otherStorageManager"
    	class="edu.asu.diging.gilesecosystem.nepomuk.core.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/files/parent/folder/" />
    	<property name="fileTypeFolder" value="others"></property>
    </bean>

    Each base directory needs to point to a folder to store images, pdfs, texts, or other files.

  5. Now, generate a new war file from the unpacked and changed files and deploy it in your Tomcat. 

    If you are on a Unix-based operating system, you can do this for example by running the command jar -cvf nepomuk.war . from inside the unpacked Nepomuk folder.

  6. Once deployed, Tomcat should be accessible at http://your.server/nepomuk.


Upgrading to v0.6

If you are upgrading from an earlier version to version v0.6, you will have to migrate existing data as follows:

  1. Make sure Nepomuk is running without exceptions with the new version.
  2. Go to http://your.nepomuk.server/nepomuk-root/admin/migrate
  3. Enter the username of the user whose data you want to migrate. The username will be a combination of provider id and provider user id. For example for GitHub: github_123456. Depending on how much data the user has uploaded, this might take a while.
  4. Once the migration is done for a user, you will see some statistics about how many objects were migrated.

Cepheus

Cepheus needs the following software to be installed:

  • Tomcat 8

You can either build Cepheus from source by downloading Cepheus' source code or download the war files uploaded for a release. This page explains how to install Cepheus using the provided war file. The download page is https://github.com/diging/giles-eco-cepheus/releases. In most cases, you should choose the latest release.

Once downloaded follow these steps:

  1. Unpack the war file (e.g. by changing its ending to ".zip" and unzipping it)
  2. Find the file WEB-INF/classes/config.properties and edit the following properties:
    • cepheus_url: The base URL of Cepheus such as https://your.cepheus.server/cepheus.
    • If your Kafka server is not running on the same machine as Cepheus, or if it is running on a different port than the default port (9092), you have to change the property 

      kafka_hosts to reflect this.

    • If you want Cepheus to create a different image format than tiffs, use a different dpi value, or a different type of image than RGB, you can change those settings in this file as well.
  3. Find the file WEB-INF/classes/user.properties and edit admin password:
    • admin=AdminPassword,ROLE_ADMIN,enabled: the password is the first value after the equal sign (AdminPassword). 
  4. Find the file WEB-INF/spring/root-contextx.xml and change the property baseDirectory of following bean definitions:

    <bean id="fileStorageManager" class="edu.asu.diging.gilesecosystem.util.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/cepheus/folder/" />
    	<property name="fileTypeFolder" value="tmp"></property>
    </bean>

    The base directory property needs to point to a folder where Cepheus will store temporary files.

  5. Now, generate a new war file from the unpacked and changed files and deploy it in your Tomcat. 

    If you are on a Unix-based operating system, you can do this for example by running the command jar -cvf cepheus.war . from inside the unpacked Cepheus folder.

  6. Once deployed, Tomcat should be accessible at http://your.server/cepheus.

JBIG2 Images

If you expect to work with PDF files that contain images in the JBIG2 format, you need to add the levigo-jbig2-imageio library to your Tomcat's lib folder.

Cassiopeia

Cassiopeia needs the following software to be installed:

  • Tomcat 8
  • Tesseract

You can either build Cassiopeia from source by downloading Cassiopeia's source code or download the war files uploaded for a release. This page explains how to install Cassiopeia using the provided war file. The download page is https://github.com/diging/giles-eco-cepheus/releases. In most cases, you should choose the latest release.

  1. Unpack the war file (e.g. by changing its ending to ".zip" and unzipping it)
  2. Find the file WEB-INF/classes/config.properties and edit the following properties:
    • cassiopeia_url: The base URL of Cassiopeia such as https://your.cassiopeia.server/cassiopeia.
    • If your Kafka server is not running on the same machine as Cepheus, or if it is running on a different port than the default port (9092), you have to change the property 

      kafka_hosts to reflect this.

    • tesseract_bin_folder: the folder of of the Tesseract executable. Default is /usr/bin.
    • tesseract_data_folder: folder where tessdata is located. Default is /usr/share/tesseract/.
    • tesseract_create_hocr: if you want Cassiopeia to create HOCR instead of plain text, set this property to true.
  3. Find the file WEB-INF/classes/user.properties and edit admin password:
    • admin=AdminPassword,ROLE_ADMIN,enabled: the password is the first value after the equal sign (AdminPassword). 
  4. Find the file WEB-INF/spring/root-contextx.xml and change the property baseDirectory of following bean definitions:

    <bean id="fileStorageManager" class="edu.asu.diging.gilesecosystem.util.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/cassiopeia/folder/" />
    	<property name="fileTypeFolder" value="tmp"></property>
    </bean>

    The base directory property needs to point to a folder where Cassiopeia will store temporary files.

  5. Now, generate a new war file from the unpacked and changed files and deploy it in your Tomcat. 

    If you are on a Unix-based operating system, you can do this for example by running the command jar -cvf cassiopeia.war . from inside the unpacked Cassiopeia folder.

  6. Once deployed, Tomcat should be accessible at http://your.server/cassiopeia.

Andromeda

Andromeda needs the following software to be installed:

  • Tomcat 8

You can either build Andromeda from source by downloading Andromeda's source code or download the war files uploaded for a release. This page explains how to install Andromeda using the provided war file. The download page is https://github.com/diging/giles-eco-andromeda/releases. In most cases, you should choose the latest release.

Once downloaded follow these steps:

  1. Unpack the war file (e.g. by changing its ending to ".zip" and unzipping it)
  2. Find the file WEB-INF/classes/config.properties and edit the following properties:
    • andromeda_url: The base URL of Andromeda such as https://your.andromeda.server/andromeda
    • If your Kafka server is not running on the same machine as Andromeda, or if it is running on a different port than the default port (9092), you have to change the property 

      kafka_hosts to reflect this.

  3. Find the file WEB-INF/classes/user.properties and edit admin password:
    • admin=AdminPassword,ROLE_ADMIN,enabled: the password is the first value after the equal sign (AdminPassword). 
  4. Find the file WEB-INF/spring/root-contextx.xml and change the property baseDirectory of following bean definitions:

    <bean id="fileStorageManager" class="edu.asu.diging.gilesecosystem.util.files.impl.FileStorageManager">
    	<property name="baseDirectory" value="/path/to/andromeda/folder/" />
    	<property name="fileTypeFolder" value="tmp"></property>
    </bean>

    The base directory property needs to point to a folder where Andromeda will store temporary files.

  5. Now, generate a new war file from the unpacked and changed files and deploy it in your Tomcat. 

    If you are on a Unix-based operating system, you can do this for example by running the command jar -cvf andromeda.war . from inside the unpacked Andromeda folder.

  6. Once deployed, Tomcat should be accessible at http://your.server/andromeda.

JBIG2 Images

If you expect to work with PDF files that contain images in the JBIG2 format, you might need to add the levigo-jbig2-imageio library to your Tomcat's lib folder.