1. Generic Artefacts
When you work with the session API you often need to reference an object, a dimension or an attribute, and you often need to define aggregation functions, group-bys and selections. The following sections describe how those artefacts are encoded in JSON. JSON is often somewhat verbose. The JavaScript interface (see the corresponding section on Xplain.db) offers some more compact notions.
Referencing Objects, Dimensions and Attributes
Each object within the hierarchy of objects has a unique name. In other words, the name of an object is a unique identifier for an object within a session.
Like the fields of a table in a database, the dimensions of an object need to have unique names. Different objects, however, may have dimensions with the same. That means the combination of object name and dimension name is a unique identifier for a dimension within a session. A dimension is therefore referenced as:
{ "object": "Diagnosis", "dimension": "Diagnosis Code" }
Likewise, attributes attached to a dimension need to have unique names, different dimensions, however, may have attributes with the same name. The triple object name, dimension name and attribute name is a unique identifier for an attribute within a session and is referenced as:
{ "object": "Diagnosis", "dimension": "Diagnosis Code", "attribute": "ICD Categorization" }
Finally, the states of an attribute have unique names across the entire hierarchy of states. A unique reference to a state is:
{ "object": "Diagnosis", "dimension": "Diagnosis Code", "attribute": "ICD Categorization", "state": "11 Diseases of the circulatory system" }
Elements of a Request: Aggregation, GroupBy and Selection
Lets start with a typical SQL statement which aggregates data. An example reads like:
select SUM(price) from sales GROUP BY region WHERE gender = "male"
The three important parts are: 1) The aggregation function and what measure to aggregate: SUM(price) 2) The group by statement: GROUP BY region 3) And the WHERE clause: WHERE gender = “male”
In a Xplain request you need to define the corresponding three elements. The following describes the generic JSON format of an “aggregation”, a “group-by” and a “selection”.
Aggregation
COUNT
A simple example for an aggregation is counting the number of instances for an object - in this case count the number of patients stored:
{
"aggregationType": "COUNT",
"object": "Patient"
"aggregationName": "Number Patients" // optional argument
}
The SQL counterpart would be counting the number of lines in the table “Patient”. The argument “aggregationName” is optional. If you do not specify the name, it will be generated. The “aggregationName” is comparable to the “AS” keyword in SQL: if you give it a name you can later on reference this result column by that name.
The aggregation type always needs to be given. Available aggregation types are:
COUNT
, COUNTDISTINCT
, COUNTENTITY
, SUM
, AVG
, MIN
, MAX
, VAR
, VARP
, STDEV
, STDEVP
, FIRSTKEY
Depending on the type additional arguments need to be provided.
COUNTDISTINCT
Counts distinct attribute values:
{
"aggregationType": "COUNTDISTINCT",
"object": "Sales",
"dimension": "Product", // deprecated alternative argument: "measure"
"attribute": "Product Category",
"states": ["Hardware", "Software" ]
}
The above example counts the number of different product categories. The argument “states” is optional. If it is given, only the number of different values of the given states (= values of the attribute) is computed. If the states argument is not give all different attribute values are counted. If the attribute has an hierarchy, the different values on the first level of the hierarchy are counted.
You may also define a different level on which you would like to count the distinct states by explicitly providing the hierarchy level name:
{
"aggregationType": "COUNTDISTINCT",
"object": "Prescription",
"dimension": "Prescribed Product",
"attribute": "ATC Category",
"level": "ATC Level 3"
}
The above example counts the number of different different ATC level 3 categories.
COUNTENTITY
This is specific for counting distinct object keys:
{
"aggregationType": "COUNTENTITY",
"object": "Sales",
"dimension": "CustomerID" // deprecated alternative argument: "measure"
}
Counting distinct object keys can be done much more efficiently than the standard count distinct. It works even for a very large number of different keys, for example for millions of different customer IDs. The “dimension” argument needs to provide the name of the dimension which holds the key, and “object” is the associated object. In case the dimension is the primary key of the associated object, the count is equivalent to the standard COUNT aggregation function.
SUM and AVG
Sums the values of a numeric dimension / computes the average:
{
"aggregationType": "SUM", // or "AVG"
"object": "Sales",
"dimension": "Price" // deprecated alternative argument: "measure"
}
MIN and MAX
Computes the minimum or the maximum of the values of a numeric dimension:
{
"aggregationType": "MIN", // or "MAX"
"object": "Sales",
"dimension": "Price" // deprecated alternative argument: "measure"
}
VAR, VARP, STDEV and STDEVP
Those for aggregation methods compute the variance or the standard deviation of the values of a numeric dimension (measure). For each there are two versions existing - the version with suffix “P” (VARP and STDEVP) is the unbiased estimate (“Erwartungstreuer Schaetzer”):
{
"aggregationType": "VAR", // or VARP, STDEV, STDEVP
"object": "Sales",
"dimension": "Price" // deprecated alternative argument: "measure"
}
FIRSTKEY and INNERFIRST
Those aggregation methods are still in a beta status - ask the Xplain team how to use this.
GroupBy
A group-by refers to an attribute and a set of states of the attribute. An example of the JSON encoding of a group-by is:
{
"attribute": { "object": "Diagnosis", "dimension": "Date", "attribute": "Local Date CET" },
"groupByLevel" : "Quarters",
"includeUpperLevels" : false
}
Many attributes have a hierarchy of states. A date attribute typically has a hierarchy of years, quarters, months and days. The argument “groupByLevel” specifies the hierarchy level of the group-by states, in the above example a group-by quarters. This argument is optional. If it is not give the first hierarchy level is used as a default. If “includeUpperLevels” is set to “true” then also the hierarchy levels above will be included into the group-by (e.g. quarters and years).
Instead of defining the group-by states implicitly by means of the hierarchy level, you may also define them in terms of an explicit list of states, for example the group-by:
{
"attribute": { "object": "Diagnosis", "dimension": "Date", "attribute": "Local Date CET" },
"groupByStates" : ["2016", "2017"]
}
will group data only by the years 2016 and 2017.
All examples so far refer to “one-dimensional” group-bys. You may define a multi-dimensional grouping by putting the additional “one-dimensional” group-bys into an array named “subGroupings”, e.g. additionally group by all combinations of age and gender:
{
"attribute": { "object": "Diagnosis", "dimension": "Date", "attribute": "Local Date CET" },
"groupByLevel" : "Quarters",
"subGroupings": [
{ "attribute": { "object": "Customer", "dimension": "Gender", "attribute": "Gender" } },
{ "attribute": { "object": "Customer", "dimension": "Age", "attribute": "Age-Group" } }
]
}
The above multi-dimensional group-by defines all combinations between quarters, gender values and age-groups. It is equivalent to putting all three one-dimensional group-bys into the array of sub-groupings:
{
"subGroupings": [
{ "attribute": { "object": "Diagnosis", "dimension": "Date", "attribute": "Local Date CET" },
“groupByLevel" : "Quarters" },
{ "attribute": { "object": "Customer", "dimension": "Gender", "attribute": "Gender" } },
{ "attribute": { "object": "Customer", "dimension": "Age", "attribute": "Age-Group" } }
]
}
Selection
Within the terminology of Xplain, a “selection” refers to a condition on an object, in other domains also called a “filter criterion” or simply the “where clause” in an SQL statement. A selection refers to an attribute and list the values / states of the attribute which are included into the selection:
{
"attribute": { "object": "Diagnosis", "dimension": "Diagnosis Code", "attribute": "ICD Categorization" },
"selectedStates" : ["Diabetes", "Measles"]
}
Defining a Request
A request consists of an array of aggregations, an array of group-bys (each one- or multi-dimensional) and an array of selections:
{
"requestID": "My Request", // deprecated alternative argument "requestName"
"aggregations": [ {/* an aggregation object */}, { }, ... ],
"groupBys": [ {/* a group by object */}, { }, ... ],
"selections" : [ {/* a selection object */}, { }, ...],
"externalSelections": "name of selection set"
}
A request needs to have an ID. Only by means of the unique ID you can find the results for a request in the session document which is returned by the server after computing the request.
In addition or instead of the “selections” parameter you may provide the optional parameter “externalSelections”. This name refers to selections shared by multiple requests, i.e., selection sets defined in an Xplain session. Per default, in a session there exists the selection set “globalSelections”. You may define addiional selection sets using the method “addSelectionSet”. If externalSelections are give, the request pulls those selections from the given selection set before computing the query results. If selections and externalSelections are give, the given selections prevail - only selections on attributes not given explicitly in “selections” are dyamically pulled from the exernalSelections. This concept is the basis for the inteactive usability oncept of the Xplain Object Explorer user inerface.