Installation
This page describes how you install Giles. Giles depends on two other web applications: Digilib and Jars. Digilib serves up images and Jars manages their metadata. You can run Giles without Jars. However, you will have a bunch of non-working links in that case.
System Requirements
- Java 8
- Tomcat 7
- Digilib
- Jars
- Tesseract
Note: if you don't have Tesseract installed, you can disabled the OCR feature (see properties below) to keep Giles from trying to run OCR on images. If the OCR feature is enabled but Tesseract is not installed or configured, Giles will still work and simply silently fail when trying to OCR images.
Notes on Giles & Digilib
Giles works under the assumption that it is running on the same machine as your Digilib installation. When uploading a file, Giles simply puts that file into a folder accessible by Digilib. If you have different requirements, then Giles won't work for you. However, if you are comfortable programming Java, you could implement your own FileStorageManager (edu.asu.giles.files.impl.FileStorageManager). This class is the one that handle file storage in Giles.
Giles adds authentication and authorization to Digilib. This only works effectively if you add authentication to Digilib as well. The easiest way to that is to let Tomcat close down access to Digilib completely and only allow request from localhost. If you want parts of your Digilib contents to be open to the public or specific groups or users, please consult the Digilib documentation.
Giles and Databases
Giles uses two different databases.The goal is to change that in the future but for now this is the way it is. Information about uploaded files are stored in several Db4o database files. Advantange of this is that they can easily be moved around and backed up.
Information about GitHub user account tokens are stored in a MySQL database. This is the default implementation of the Social Spring module that Giles uses. Before you deploy Giles, make sure to create a database and run the following script to create a table to hold information about what users authorized Giles to use their GitHub profile.
create table UserConnection (userId varchar(255) not null, providerId varchar(255) not null, providerUserId varchar(255), rank int not null, displayName varchar(255), profileUrl varchar(512), imageUrl varchar(512), accessToken text not null, secret varchar(512), refreshToken varchar(512), expireTime bigint, primary key (userId, providerId, providerUserId)); create unique index UserConnectionRank on UserConnection(userId, providerId, rank);
Building Giles
Giles is a Java/Spring web application that uses Maven. The easiest way is therefore to download the latest release and run maven to build a war-file. You can either do that in your favorite IDE or through the terminal. If you are comfortable using Maven, simply run mvn package
specifying the following parameters:
giles.base.url: the final URL Giles will run at (e.g. http://myserver.org/giles). Giles will use that URL to build links to Digilib content.
admin.password: the password for you admin user. It should be encoded with BCrypt strength=4. Default is "admin".
admin.username (optional): if you want your admin user to have a different name than "admin", you can specify that with this property.
github.clientId: OAuth client id of your GitHub application registration.
github.secret: OAuth client secrete of your GitHub application registration.
- [starting with v0.7] google.clientId: OAuth client id of your Google application registration.
- [starting with v0.7] google.secret: OAuth client secrete of your Google application registration.
- [starting with v0.7] mitreid.clientId: OAuth client id of your MITREid Connect application registration.
- [starting with v0.7] mitreid.secret: OAuth client secrete of your MITREid Connect application registration.
- [starting with v0.7] mitreid.server.url: Url of your MITREid Connect server instance.
- [starting with v0.7] github.show.login: flag to specify if the GitHub login button should be shown on login page.
- [starting with v0.7] google.show.login: flag to specify if the Google login button should be shown on login page.
- [starting with v0.7] mitreid.show.login: flag to specify if the MITREid Connect login button should be shown on login page.
- [starting with v0.7] jwt.signing.secret: a secure-random secret key used to sign Giles API tokens.
- [starting with v0.7] jwt.signing.secret.apps: a secure-random secret key used to sign app tokens.
- db_files: Path to a folder in your file system that will hold Giles' database files.
db.driver: if you don't use MySQL you can specify the appropriate driver here. Note that if you are not using MySQL you will also have to add the correct driver dependency to the pom.xml.
db.database.url: the url to the database (most likely something like jdbc:mysql://localhost:3306/giles).
db.user: username of your database user.
db.password: password of your database user.
digilib.url: url to your Digilib installation. The path should be the path to the Scaler servlet (e.g. http://myserver.org/digilib/servlet/Scaler)
digilibBaseDir: path to the digilib directory that should hold your images. Digilib has to have access to this directory.
jars.url: url to your Jars installation.
- [starting with v0.6] jars.file.url: path to file metadata in metadata service (automatically prefixed with value of jars.url)
- [starting with v0.6] metadata.upload.add: path to page in metadata service for adding metadata after an upload (automatically prefixed with value of jars.url)
- [starting with v0.6] metadata.service.doc.url: path to document metadata in metadata service (automatically prefixed with value of jars.url)
- buildNumber (optional): if you want Giles to show a specific version number, you can specify that version number with this property.
- pdfBaseDir: path to a folder in the file system to store uploaded PDFs
- pdf.conversion.dpi (optional): dpi used for converting PDFs to images. Default is 600.
pdf.conversion.type (optional): image type to use when converting PDFs to images. Options are:
RGB (Red, Green, Blue) [default]
ARGB (Alpha, Red, Green, Blue)
GRAY (Shades of gray)
BINARY (Black or white)
pdf.conversion.format (optional): image format to use when converting PDFs to images. Default is
tiff
.- Should be one of the following:
JPG, jpg, tiff, bmp, BMP, pcx, PCX, gif, GIF, WBMP, png, PNG, raw, RAW, JPEG, pnm, PNM, tif, TIF, TIFF, wbmp, jpeg
- Should be one of the following:
- [starting with v0.4] textBaseDir: path to a folder in the file system that hold extracted text files
- [starting with v0.4] tesseract.bin: path to parent folder of tesseract executable. For example, if your tesseract executable is
/usr/bin/tesseract
, then this property should be set to/usr/bin/
. [starting with v0.4] tesseract.data: path to the parent folder of your tesseract
tessdata
folder. For example, if you tessdata folder is at/usr/share/tesseract/tessdata/
, then this property should be set to/usr/share/tesseract/
.- [starting with v0.9] tesseract.create.hocr (optional):
true
orfalse
; if set totrue
Giles will instruct tesseract to create hocr instead of plain text. Default isfalse
. - [starting with v0.4] ocr.worker.count (optional): number of threads that are used to run tesseract. Default is 2.
- [starting with v0.4] ocr.images.from.pdfs (optional): this property defines if OCR is run on images that were created from PDFs. Should be
true
orfalse
. Default istrue
. - [starting with v0.4] log.level (optional): if Maven is run using the mydev or test profile, this property sets the log level. Default is
debug
. The prod profile sets this property toinfo
.
Step-by-step guide
You will probably not getting around learning a little bit about Maven in order to build Giles, but this step-by-step guide hopefully makes it as easy as possible.
- Install Maven
- In a terminal go into the folder giles-{version}/giles-spring
- You need only one command to build Giles:
mvn clean package
. This will delete all previous built files and generate a war file. However, you need to specify the above listed properties for Giles to work correctly. - To specify a property when running Maven, the easiest way is to append
-D{property_name}={property_value}
. For example, if your database user is called "giles", then you would append-Ddb.user=giles
to the Maven command. The complete command would look like this:mvn clean package -Ddb.user=giles
. Append each property to the command like described in step 4. You complete command string will look something like this:
mvn clean package -Ddb_files=/path/to/db/files -Dadmin.password=GilesPassword -Dgithub.clientId=githubClientId -Dgithub.secret=githubClientSecret -Ddb.driver=com.mysql.jdbc.Driver -Ddb.database.url=jdbc:mysql://localhost:3306/giles -Ddb.user=giles -Ddb.password=GilesDbPassword -Ddigilib.url=http://myserver.org/digilib/servlet/Scaler -DdigilibBaseDir=/path/to/digilib/images -Djars.url=http://myserver.org/jars -Dgiles.base.url=http://myserver.org/giles -DpdfBaseDir=/path/to/pdfs/folder
Maven will create a new folder in giles-spring called
target
. If Maven ran successfully you will find a file calledgiles.war
inside this folder.- Simply put
giles.war
into your Tomcat's webapp directory and you should be good to go!
System Configs
Starting with version v0.5 a lot of the system properties can also be configured through the web app itself by any admin user.