Tutorial
This tutorial walks through the main features of the Xplain Python package with practical examples.
1. Connecting to Xplain
Quick Start (Recommended):
Use create_session() with credentials from environment variables or config file:
from xplain import create_session
# Uses credentials from environment variables or ~/.xplainpyrc
session = create_session()
# Load a startup configuration
session.startup("MyStartupConfig")
# View the object tree structure
session.show_tree()
See Authentication & Credential Management for secure credential management.
Direct Connection (legacy):
For cases where credentials cannot be stored in a profile, the legacy
Xsession constructor accepts them inline. See Legacy
for the full reference.
from xplain import Xsession
# Password authentication
session = Xsession(url="http://myhost:8080", user="myuser", password="mypassword")
# JWT authentication
session = Xsession(
url="https://prod.company.com",
user="myuser",
jwt_dispatch_url="https://prod.company.com/jwt/dispatch",
jwt_cookie_name="XPLAIN_JWT",
jwt_token="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
)
Using Existing Sessions:
You can also connect to an existing session:
# Load from an existing session ID (e.g., from Xplain Object Explorer)
session.load_from_session_id("30FA4025AC5AE68852D803838FAA503D")
# Get the current session ID to share with other clients
print(session.get_session_id())
Multiple Environments:
Use profiles for different environments:
from xplain import create_session
# Development
dev_session = create_session(profile='local')
# Production
prod_session = create_session(profile='production')
2. Exploring the Object Tree
Xplain organizes data in a hierarchical tree of objects, dimensions, and attributes.
Using the session directly:
# Get metadata for an object
obj_info = session.get_object_info("Person")
# Get metadata for a dimension
dim_info = session.get_dimension_info("Person", "Gender")
# Get metadata for an attribute
attr_info = session.get_attribute_info("Person", "Gender", "Gender")
# Get the full tree details (object -> dimension -> attribute)
tree = session.get_tree_details()
Using the object-oriented API:
# Get the root object
root = session.get_root_object()
# Get a specific object
obj = session.get_xobject("Person")
# List dimensions
dimensions = obj.get_dimensions() # Returns list of dimension names
# List child objects
children = obj.get_child_objects() # Returns list of child object names
3. Opening Attributes
Open an attribute to see counts grouped by its hierarchy:
# Count records grouped by Gender
df = session.open_attribute(
object_name="Person",
dimension_name="Gender",
attribute_name="Gender"
)
print(df)
# Returns a pandas DataFrame with counts per state
4. Building and Executing Queries
Use Query_config to build queries programmatically:
from xplain import Query_config
# Create a query configuration
query = Query_config()
# Add an aggregation (what to measure)
query.add_aggregation(
object_name="Orders",
dimension_name="Revenue",
type="SUM"
)
# Add a group-by (how to group results)
query.add_groupby(
object_name="Orders",
dimension_name="Category",
attribute_name="ProductType"
)
# Add a selection (filter)
query.add_selection(
object_name="Orders",
dimension_name="OrderDate",
attribute_name="OrderDate",
selected_states=["2024-01"]
)
# Execute and get results as a DataFrame
df = session.execute_query(query)
print(df)
Supported aggregation types:
SUM- Sum of valuesAVG- AverageCOUNT- Count of recordsCOUNTDISTINCT- Count of distinct valuesCOUNTENTITY- Count of entitiesMAX- Maximum valueMIN- Minimum valueVAR- VarianceSTDEV- Standard deviationQUANTILE- Quantile
You can also execute queries using raw JSON:
query_json = {
"aggregations": [{
"object": "Orders",
"dimension": "Revenue",
"type": "SUM"
}],
"groupBys": [{
"subGroupings": [{
"attribute": {
"object": "Orders",
"dimension": "Category",
"attribute": "ProductType"
}
}]
}]
}
df = session.execute_query(query_json)
5. Fluent Query Builder
XplainSession.query_builder() returns a QueryBuilder that lets you
compose the same query in a single expression using method chaining. It is an
alternative to Query_config — both produce identical requests
to the server.
Basic usage — attribute auto-resolved:
df = (
session.query_builder(name="lab_counts")
.aggregate(object="Lab Events", type="COUNT")
.groupby(attribute="TestType") # object/dimension found automatically
.execute()
)
With selection filter and named group-by level:
df = (
session.query_builder()
.aggregate(object="Orders", type="SUM", dimension="Revenue")
.groupby(attribute="Month", groupby_level_name="Month")
.selection(attribute="Year", selected_states=["2024"])
.execute()
)
Live query — stays open and updates when session selections change:
df = (
session.query_builder(name="live_revenue")
.aggregate(object="Orders", type="SUM", dimension="Revenue")
.groupby(attribute="Category")
.open()
)
Disambiguation — if an attribute appears in more than one object the
builder raises a ValueError listing all candidates:
# ValueError: Attribute 'Date' is ambiguous — found in multiple locations:
# object='Lab Events', dimension='Date'
# object='Orders', dimension='OrderDate'
# → pass object/dimension to disambiguate:
df = (
session.query_builder()
.aggregate(object="Lab Events", type="COUNT")
.groupby(attribute="Date", object="Lab Events", dimension="Date")
.execute()
)
Equivalent Query_config for comparison:
from xplain import Query_config
query = Query_config()
query.add_aggregation(object_name="Lab Events", dimension_name=None, type="COUNT")
query.add_groupby(attribute_name="TestType")
df = session.execute_query(query)
6. Working with Selections
Apply selections (filters) to the session:
# Apply a selection using the run method
session.run({
"method": "select",
"selections": [{
"attribute": {
"object": "Orders",
"dimension": "Category",
"attribute": "ProductType"
},
"selectedStates": ["Electronics", "Books"]
}]
})
# View current selections
selections = session.get_selections()
# Download selections for specific objects
sel = session.download_selections(objects=["Orders"])
Using the advanced API:
api = session.get_api()
# Select with automatic object/dimension lookup
api.select(
attribute_name="ProductType",
selected_states=["Electronics", "Books"]
)
# Clear all selections
api.clear_all_selections()
7. Aggregation Dimensions
Add computed aggregation dimensions to objects:
# Using XObject
obj = session.get_xobject("Person")
obj.add_aggregation_dimension(
dimension_name="#Prescriptions",
aggregation={
"aggregationType": "COUNT",
"object": "Prescription"
},
floating_semantics=False
)
# View the result
df = session.open_attribute("Person", "#Prescriptions", "#Prescriptions - Log-Ranges")
print(df)
8. Relative Time Dimensions
Create relative time dimensions for temporal analysis:
# Example: Time from diagnosis to death
session.run({
"method": "select",
"selections": [{
"attribute": {
"object": "Diagnosis",
"dimension": "Type",
"attribute": "DiagnosisType"
},
"selectedStates": ["Cardiovascular"]
}]
})
session.run({
"method": "addRelativeTimeDimensions",
"baseObject": "Person",
"floatingSemantics": True,
"names": ["TimeToEvent"],
"referenceTimeDimension": {
"object": "Diagnosis",
"dimension": "Date"
},
"relativeTimeType": "TO_FIRST",
"timeDimensions": [{
"object": "Person",
"dimension": "DeathDate"
}]
})
df = session.open_attribute("Person", "TimeToEvent", "TimeToEvent")
print(df)
9. Exporting Instance Data
Export detailed instance data as a CSV/DataFrame:
elements = [
{"object": "Person"},
{"object": "Diagnosis", "dimension": "Type"},
{"object": "Prescription", "dimension": "Drug", "attribute": "ATC"}
]
df = session.get_instance_as_dataframe(elements)
print(df)
# Save to CSV
df.to_csv("export.csv", index=False)
10. Predictive Modeling
Work with predictive models:
# Import a model
session.run({
"method": "importModel",
"fileName": "MyModel.xmodelresult"
})
# List loaded models
models = session.get_model_names()
# Get model variables
variables = session.get_variable_list("MyModel")
# Get detailed variable information
details = session.get_variable_details("MyModel")
print(details)
11. Statistical Modeling with statsmodels
Run statistical models on query results:
# Get data
df = session.execute_query(query)
# Build a formula
formula = session.build_formula(
response="outcome",
predictors=["age", "gender", "treatment"]
)
# Run logistic regression
result = session.run_statsmodels(df, formula, model_type="logit")
print(result.summary())
# Other supported model types:
# "ols" - Ordinary Least Squares
# "probit" - Probit regression
# "mnlogit" - Multinomial logit
# "glm" - Generalized Linear Model
# "poisson" - Poisson regression
# "negative_binomial" - Negative Binomial regression
# Create a contingency table
table = session.create_contingency_table(df, "Gender", "Treatment")
print(table)
12. Session Management
# Refresh session state from the server
session.refresh()
# List open queries
queries = session.get_queries()
# Get result of a specific query
df = session.get_result("query_name")
# Control broadcast behavior (for multi-client environments)
session.set_default_broadcast(True) # Notify other clients of changes
# Terminate the session
session.terminate()
13. Working with Analyses
# List available analyses
analyses = session.list_analyses()
# Load an analysis (includes startup config and saved state)
object_structure = session.load_analysis("MyAnalysis")
# Resume a stored analysis session
session.resume_analysis("MyAnalysis")
14. Visualization
Render an interactive collapsible tree in Jupyter:
# In a Jupyter notebook
session.collapsible_tree()