Tutorial

This tutorial walks through the main features of the Xplain Python package with practical examples.

1. Connecting to Xplain

Quick Start (Recommended):

Use create_session() with credentials from environment variables or config file:

from xplain import create_session

# Uses credentials from environment variables or ~/.xplainpyrc
session = create_session()

# Load a startup configuration
session.startup("MyStartupConfig")

# View the object tree structure
session.show_tree()

See Authentication & Credential Management for secure credential management.

Direct Connection (legacy):

For cases where credentials cannot be stored in a profile, the legacy Xsession constructor accepts them inline. See Legacy for the full reference.

from xplain import Xsession

# Password authentication
session = Xsession(url="http://myhost:8080", user="myuser", password="mypassword")

# JWT authentication
session = Xsession(
    url="https://prod.company.com",
    user="myuser",
    jwt_dispatch_url="https://prod.company.com/jwt/dispatch",
    jwt_cookie_name="XPLAIN_JWT",
    jwt_token="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
)

Using Existing Sessions:

You can also connect to an existing session:

# Load from an existing session ID (e.g., from Xplain Object Explorer)
session.load_from_session_id("30FA4025AC5AE68852D803838FAA503D")

# Get the current session ID to share with other clients
print(session.get_session_id())

Multiple Environments:

Use profiles for different environments:

from xplain import create_session

# Development
dev_session = create_session(profile='local')

# Production
prod_session = create_session(profile='production')

2. Exploring the Object Tree

Xplain organizes data in a hierarchical tree of objects, dimensions, and attributes.

Using the session directly:

# Get metadata for an object
obj_info = session.get_object_info("Person")

# Get metadata for a dimension
dim_info = session.get_dimension_info("Person", "Gender")

# Get metadata for an attribute
attr_info = session.get_attribute_info("Person", "Gender", "Gender")

# Get the full tree details (object -> dimension -> attribute)
tree = session.get_tree_details()

Using the object-oriented API:

# Get the root object
root = session.get_root_object()

# Get a specific object
obj = session.get_xobject("Person")

# List dimensions
dimensions = obj.get_dimensions()  # Returns list of dimension names

# List child objects
children = obj.get_child_objects()  # Returns list of child object names

3. Opening Attributes

Open an attribute to see counts grouped by its hierarchy:

# Count records grouped by Gender
df = session.open_attribute(
    object_name="Person",
    dimension_name="Gender",
    attribute_name="Gender"
)
print(df)

# Returns a pandas DataFrame with counts per state

4. Building and Executing Queries

Use Query_config to build queries programmatically:

from xplain import Query_config

# Create a query configuration
query = Query_config()

# Add an aggregation (what to measure)
query.add_aggregation(
    object_name="Orders",
    dimension_name="Revenue",
    type="SUM"
)

# Add a group-by (how to group results)
query.add_groupby(
    object_name="Orders",
    dimension_name="Category",
    attribute_name="ProductType"
)

# Add a selection (filter)
query.add_selection(
    object_name="Orders",
    dimension_name="OrderDate",
    attribute_name="OrderDate",
    selected_states=["2024-01"]
)

# Execute and get results as a DataFrame
df = session.execute_query(query)
print(df)

Supported aggregation types:

SUM - Sum of values
AVG - Average
COUNT - Count of records
COUNTDISTINCT - Count of distinct values
COUNTENTITY - Count of entities
MAX - Maximum value
MIN - Minimum value
VAR - Variance
STDEV - Standard deviation
QUANTILE - Quantile

You can also execute queries using raw JSON:

query_json = {
    "aggregations": [{
        "object": "Orders",
        "dimension": "Revenue",
        "type": "SUM"
    }],
    "groupBys": [{
        "subGroupings": [{
            "attribute": {
                "object": "Orders",
                "dimension": "Category",
                "attribute": "ProductType"
            }
        }]
    }]
}
df = session.execute_query(query_json)

5. Fluent Query Builder

XplainSession.query_builder() returns a QueryBuilder that lets you compose the same query in a single expression using method chaining. It is an alternative to Query_config — both produce identical requests to the server.

Basic usage — attribute auto-resolved:

df = (
    session.query_builder(name="lab_counts")
    .aggregate(object="Lab Events", type="COUNT")
    .groupby(attribute="TestType")   # object/dimension found automatically
    .execute()
)

With selection filter and named group-by level:

df = (
    session.query_builder()
    .aggregate(object="Orders", type="SUM", dimension="Revenue")
    .groupby(attribute="Month", groupby_level_name="Month")
    .selection(attribute="Year", selected_states=["2024"])
    .execute()
)

Live query — stays open and updates when session selections change:

df = (
    session.query_builder(name="live_revenue")
    .aggregate(object="Orders", type="SUM", dimension="Revenue")
    .groupby(attribute="Category")
    .open()
)

Disambiguation — if an attribute appears in more than one object the builder raises a ValueError listing all candidates:

# ValueError: Attribute 'Date' is ambiguous — found in multiple locations:
#   object='Lab Events', dimension='Date'
#   object='Orders', dimension='OrderDate'
# → pass object/dimension to disambiguate:
df = (
    session.query_builder()
    .aggregate(object="Lab Events", type="COUNT")
    .groupby(attribute="Date", object="Lab Events", dimension="Date")
    .execute()
)

Equivalent Query_config for comparison:

from xplain import Query_config

query = Query_config()
query.add_aggregation(object_name="Lab Events", dimension_name=None, type="COUNT")
query.add_groupby(attribute_name="TestType")
df = session.execute_query(query)

6. Working with Selections

Apply selections (filters) to the session:

# Apply a selection using the run method
session.run({
    "method": "select",
    "selections": [{
        "attribute": {
            "object": "Orders",
            "dimension": "Category",
            "attribute": "ProductType"
        },
        "selectedStates": ["Electronics", "Books"]
    }]
})

# View current selections
selections = session.get_selections()

# Download selections for specific objects
sel = session.download_selections(objects=["Orders"])

Using the advanced API:

api = session.get_api()

# Select with automatic object/dimension lookup
api.select(
    attribute_name="ProductType",
    selected_states=["Electronics", "Books"]
)

# Clear all selections
api.clear_all_selections()

7. Aggregation Dimensions

Add computed aggregation dimensions to objects:

# Using XObject
obj = session.get_xobject("Person")
obj.add_aggregation_dimension(
    dimension_name="#Prescriptions",
    aggregation={
        "aggregationType": "COUNT",
        "object": "Prescription"
    },
    floating_semantics=False
)

# View the result
df = session.open_attribute("Person", "#Prescriptions", "#Prescriptions - Log-Ranges")
print(df)

8. Relative Time Dimensions

Create relative time dimensions for temporal analysis:

# Example: Time from diagnosis to death
session.run({
    "method": "select",
    "selections": [{
        "attribute": {
            "object": "Diagnosis",
            "dimension": "Type",
            "attribute": "DiagnosisType"
        },
        "selectedStates": ["Cardiovascular"]
    }]
})

session.run({
    "method": "addRelativeTimeDimensions",
    "baseObject": "Person",
    "floatingSemantics": True,
    "names": ["TimeToEvent"],
    "referenceTimeDimension": {
        "object": "Diagnosis",
        "dimension": "Date"
    },
    "relativeTimeType": "TO_FIRST",
    "timeDimensions": [{
        "object": "Person",
        "dimension": "DeathDate"
    }]
})

df = session.open_attribute("Person", "TimeToEvent", "TimeToEvent")
print(df)

9. Exporting Instance Data

Export detailed instance data as a CSV/DataFrame:

elements = [
    {"object": "Person"},
    {"object": "Diagnosis", "dimension": "Type"},
    {"object": "Prescription", "dimension": "Drug", "attribute": "ATC"}
]

df = session.get_instance_as_dataframe(elements)
print(df)

# Save to CSV
df.to_csv("export.csv", index=False)

10. Predictive Modeling

Work with predictive models:

# Import a model
session.run({
    "method": "importModel",
    "fileName": "MyModel.xmodelresult"
})

# List loaded models
models = session.get_model_names()

# Get model variables
variables = session.get_variable_list("MyModel")

# Get detailed variable information
details = session.get_variable_details("MyModel")
print(details)

11. Statistical Modeling with statsmodels

Run statistical models on query results:

# Get data
df = session.execute_query(query)

# Build a formula
formula = session.build_formula(
    response="outcome",
    predictors=["age", "gender", "treatment"]
)

# Run logistic regression
result = session.run_statsmodels(df, formula, model_type="logit")
print(result.summary())

# Other supported model types:
# "ols"              - Ordinary Least Squares
# "probit"           - Probit regression
# "mnlogit"          - Multinomial logit
# "glm"              - Generalized Linear Model
# "poisson"          - Poisson regression
# "negative_binomial" - Negative Binomial regression

# Create a contingency table
table = session.create_contingency_table(df, "Gender", "Treatment")
print(table)

12. Session Management

# Refresh session state from the server
session.refresh()

# List open queries
queries = session.get_queries()

# Get result of a specific query
df = session.get_result("query_name")

# Control broadcast behavior (for multi-client environments)
session.set_default_broadcast(True)  # Notify other clients of changes

# Terminate the session
session.terminate()

13. Working with Analyses

# List available analyses
analyses = session.list_analyses()

# Load an analysis (includes startup config and saved state)
object_structure = session.load_analysis("MyAnalysis")

# Resume a stored analysis session
session.resume_analysis("MyAnalysis")

14. Visualization

Render an interactive collapsible tree in Jupyter:

# In a Jupyter notebook
session.collapsible_tree()