3. Tutorial for Import Configuration Files
Test Data Setup
The following section demonstrates how you can import data from a database into XplainData’s software. To this end, you can unzip the file that contains our test data set (consisting of four CSV files).
Import these CSV files into your relational database. We assume that you
use the filenames as table name, e.g. the table citiescatalog
stems
from the file citiescatalog.csv
. Make sure to remember the name of
the database, the password and user you use to connect to this database.
After this, your database should consist of two tables which contain
transaction data (patients
and prescriptions
) and two tables
which contain catalog data (citiescatalog
and atccatalog
).
Please note: throughout the rest of this tutorial we will assume that the data has been imported into a MySQL database. We will use the following parameters to access this data within the next examples:
"databaseConnectionConfiguration": {
"databaseType": "MYSQL",
"url": "127.0.0.1",
"portNumber": 3306,
"user": "youruser",
"password": "yourpassword",
"databaseName": "xplain_training"
},
You will have to adopt these settings within the next examples with your parameters (user, password, name of the database, etc.)
Importing catalogs
Create a directory where all data and configuration files will later be
found. E.g. create a directory my_xplain_dir
. Create two
subdirectories: config
and data
. Do NOT alter the name of these
subdirectories.
Create a file import.xgenscript
in the config
folder. This file
will later be used to build all catalogs and to import all relational
data. Copy and paste the following into this file:
{
"tasks": [{
"name": "Generate ATC catalog",
"maxTaskThreads": 3,
"parallelTasks": [{
"task": "buildAttribute",
"attributeConfiguration": {
"databaseConnectionConfiguration": {
"databaseType": "MYSQL",
"url": "127.0.0.1",
"portNumber": 3306,
"user": "youruser",
"password": "yourpassword",
"databaseName": "xplain_training"
},
"attributeTableName": "atccatalog",
"attributeName": "pzn",
"keyFieldNameInAttributeTable": "pzn",
"hierarchyLevelColumnNames": ["atc5","atc4", "atc3",
"atc2", "atc1", "pzn"],
"hierarchyLevelNames": ["Level 1", "Level 2", "Level 3",
"Level 4", "Level 5", "PZN"]
},
"saveToFile": "atccatalog.xattribute"
}]
}]
}
As can be seen this configuration file consists of an array of task sets which will be executed later on. Currently there is only one task set defined, namely “Generate ATC catalog”. Each task set consists of an array of (parallel) tasks to be executed simultaneously. The only parallel task to be executed here is of the type “buildAttribute”.
Before we can start our first build run, we have to set the environment
variable xplainpath. Set it to your my_xplain_dir
path. For
instance, when using Linux / MacOS this can be done via
export xplainpath=/yourPathTo/my_xplain_dir/
After this we can start our first build run. This can be done with the following command:
cd /yourPathTo/my_xplain_dir/config java -jar /pathto/xplain.jar -g import.xgenscript
(/pathto has to be modified to the path where your copy of xplain.jar can be found). The output of this command should be something like:
17:00:29.218 [main] INFO com.xplaindata.ui.generation.GenerationScript - Starting script ...
17:00:29.220 [main] INFO com.xplaindata.ui.generation.GenerationScript - Executing task set "Generate ATC Catalog" ...
17:00:29.225 [pool-1-thread-1] INFO com.xplaindata.xtable.config.AttributeConfiguration - Query string for building the hierarchy of attribute pzn: SELECT atc5, atc4, atc3, atc2, atc1, pzn, pzn FROM atccatalog
17:00:29.225 [pool-1-thread-1] DEBUG com.xplaindata.itable.CategorialAttribute - Query string: SELECT atc5, atc4, atc3, atc2, atc1, pzn, pzn FROM atccatalog
17:00:30.567 [main] INFO com.xplaindata.ui.generation.GenerationScript - Task set "Generate ATC catalog" executed in [ms]: 1347
17:00:30.567 [main] INFO com.xplaindata.ui.generation.GenerationScript - ... script executed in [ms]: 1347
The second catalog will be build accordingly. Copy and paste the following after line 26 into your import.xgenscript file:
,"tasks": [{
"name": "Generate city catalog",
"maxTaskThreads": 3,
"parallelTasks": [{
"task": "buildAttribute",
"attributeConfiguration": {
"databaseConnectionConfiguration": {
"databaseType": "MYSQL",
"url": "127.0.0.1",
"portNumber": 3306,
"user": "youruser",
"password": "yourpassword",
"databaseName": "xplain_training"
},
"attributeTableName": "citiescatalog",
"attributeName": "city",
"keyFieldNameInAttributeTable": "id",
"hierarchyLevelColumnNames": ["state","city", "id"],
"hierarchyLevelNames": ["State", "City", "CityID"]
},
"saveToFile": "citycatalog.xattribute"
}]
}]
Again, execute this script:
java -jar /pathto/xplain.jar -g import.xgenscript
Now, if you check your my_xplain_dir/data
folder you should find two
files: atccatalog.xattribute
and citycatalog.xattribute
Import transaction data
Now that we imported our catalogs we will import our transaction data.
Let’s start with the main object: the patient table. In your config
folder generate a file patients.xtableconfig
and copy the following
content:
{
"databaseConnectionConfiguration" : {
"databaseType": "MYSQL",
"url": "127.0.0.1",
"portNumber": 3306,
"user": "youruser",
"password": "yourpassword",
"databaseName": "xplain_training"
},
"dbTableName" : "patients",
"objectName" : "Patients",
"dbKey" : "id",
"dbForeignKeys" : [ ],
"dimensionConfigurations" : [
{
"dbColumnName" : "id",
"dimensionName" : "Patient",
"xplainDataType" : "LONG"
}, {
"dbColumnName" : "gender",
"dimensionName" : "Gender",
"xplainDataType" : "CATEGORIAL"
}, {
"dbColumnName" : "Age",
"dimensionName" : "Age",
"xplainDataType" : "INT"
}, {
"dbColumnName" : "cityID",
"dimensionName" : "City",
"xplainDataType" : "CATEGORIAL"
}
]
}
This will later on import all patient data from our data source. Now,
generate a file prescriptions.xtableconfig
(again in your
\config
folder) and copy the following content:
{
"databaseConnectionConfiguration" : {
"databaseType": "MYSQL",
"url": "127.0.0.1",
"portNumber": 3306,
"user": "youruser",
"password": "yourpassword",
"databaseName": "xplain_training"
},
"dbTableName" : "prescriptions",
"objectName" : "Prescriptions",
"dbKey" : "id",
"dbForeignKeys" : ["patId"],
"dimensionConfigurations" : [
{
"dbColumnName" : "patId",
"dimensionName" : "patId",
"xplainDataType" : "LONG"
},{
"dbColumnName" : "id",
"dimensionName" : "id",
"xplainDataType" : "LONG"
},
{
"dbColumnName" : "date",
"dimensionName" : "Date",
"xplainDataType" : "LONG"
}, {
"dbColumnName" : "costs",
"dimensionName" : "Costs",
"xplainDataType" : "LONG"
}, {
"dbColumnName" : "pzn",
"dimensionName" : "PZN",
"xplainDataType" : "CATEGORIAL"
}
]
}
These to .xtableconfig files describe which table attributes will be imported from which tables. Now, the next step is to adopt your import.xgenscript file in order to define how to import these data. Add the following lines to this file:
"tasks" : [
{
"name" : "Generate Patient XTable",
"maxTaskThreads" : 3,
"parallelTasks" : [
{
"task" : "buildXTable",
"configurationFile" : "patients.xtableconfig",
"saveToFile" : "Patients.xtable"
}
]
}
],
"tasks" : [
{
"name" : "Generate Prescriptions XTable",
"maxTaskThreads" : 3,
"parallelTasks" : [
{
"task" : "buildXTable",
"configurationFile" : "prescriptions.xtableconfig",
"saveToFile" : "Prescriptions.xtable"
}
]
}
]
Again, execute this script:
java -jar /pathto/xplain.jar -g import.xgenscript
Now, if you check your my_xplain_dir/data
folder you should find
four files: atccatalog.xattribute
, citycatalog.xattribute
,
patients.xtable
and prescriptions.xtable
. The first two files
contain data stemming from the ATC catalog and the City catalog. The
last two files contain the corresponding transactional data, i.e. all
patient data and prescription data.