Importing Data with XplainPy

This tutorial covers importing data from a relational database into Xplain as XTables and building an XView that defines the object hierarchy for analysis.

Concepts

Before running any code it helps to understand the four building blocks of a Xplain data import:

Concept

Description

Connection

JDBC credentials for a source database (PostgreSQL, MySQL, Oracle, …). Created once, saved to a file, reused across imports.

XTable

A binary fact table produced by importing one database table (or SQL expression). Columns become dimensions; one column is the primary key.

XAttribute

A hierarchical dimension added on top of an XTable (e.g. a 3-level geography hierarchy stored in a lookup table).

XView

A startup configuration that assembles one or more XTables into an object hierarchy (parent → child), ready to load in XOE or via Python.

The Importer class is the main entry point. It provides factory methods for all four building blocks and keeps the session wired to every object it creates.

Xsession
    └── Importer
            ├── create_connection()      →  Connection
            ├── create_xtable_config()   →  Xtable_config  →  import_xtable()
            ├── create_xattribute_config() → Xattribute_config → import_xattribute()
            └── create_xview()           →  Xview  →  save()  →  startup()

Prerequisites

  • xplain package installed (pip install xplain)

  • An Xplain server is running and accessible

  • A source database is reachable from the Xplain server

from xplain import Xsession
from xplain.tools import Importer

session = Xsession(url="http://myhost:8080", user="admin", password="secret")
importer = Importer(session)

Step 1 — Create a database connection

Pass your JDBC credentials to create_connection(). Call test_connection() immediately to validate them before running a full import.

conn = importer.create_connection(
    databaseType="POSTGRESQL",
    url="db.example.com",
    user="analyst",
    password="dbpassword",
    databaseName="warehouse",
    portNumber=5432
)

result = conn.test_connection()
print(result)   # {'status': 'success'} or {'status': 'error', 'message': '…'}

Save the connection for reuse

Saving avoids storing credentials in every script. Future imports can load by name instead of re-entering credentials.

conn.save("warehouse_pg", "PUBLIC")

Load a saved connection

conn = importer.create_connection(load_from="warehouse_pg")

Explore the database schema

# List all tables in the connected database
tables = conn.get_tables()
print(tables)

# Preview the first 10 rows of a table
df = conn.get_table_content("orders", limit=10)
print(df)

Step 2 — Configure and import an XTable

Create an Xtable_config by pointing it at a table. The server auto-detects column types and populates dimensionConfigurations.

config = importer.create_xtable_config(
    connection=conn,
    db_table_name="orders",
    object_name="Orders",
    primary_key="order_id",
    foreign_keys=["customer_id", "product_id"]
)

Review the auto-detected mapping

print(config.show_dimension_configurations())
#    dbColumnName   dimensionName  xplainDataType  primaryKey  foreignKey
# 0  order_id       order_id       LONG            X
# 1  customer_id    customer_id    LONG                        X
# 2  product_id     product_id     LONG                        X
# 3  order_date     order_date     CATEGORIAL
# 4  quantity       quantity       LONG
# 5  total_amount   total_amount   LONG

Adjust dimension names and types

Column names from the database are used as-is. Override them to match your analytical naming conventions:

from xplain.tools import XplainType

# Rename dimensions
config.set_dimension_name("order_date", "OrderDate")
config.set_dimension_name("total_amount", "Revenue")

# Override types where auto-detection is wrong
config.set_dimension_xplain_data_type("quantity", XplainType.INT.value)

# Mark a date column so Xplain treats it as a time axis
config.set_time_dimension("OrderDate")

# Drop a column that is not needed for analysis
config.remove_dimension("internal_flag")

Import the XTable

result = config.import_xtable()
print(result)   # {'status': 'success'}

Save the configuration for later re-imports

config.save("orders_import", "PUBLIC")

# Reload and re-run at any time:
config2 = importer.create_xtable_config(load_from="orders_import")
config2.import_xtable()

Step 3 — Import child tables

Repeat Step 2 for every table that will appear as a child in the object hierarchy. Use foreign_keys to declare which columns link to parent tables.

# Customers table  (root's child)
customers_cfg = importer.create_xtable_config(
    connection=conn,
    db_table_name="customers",
    object_name="Customers",
    primary_key="customer_id"
)
customers_cfg.import_xtable()

# Products table  (root's child)
products_cfg = importer.create_xtable_config(
    connection=conn,
    db_table_name="products",
    object_name="Products",
    primary_key="product_id"
)
products_cfg.import_xtable()

Step 4 — Build an XView

An Xview assembles the XTables into the parent–child object hierarchy that Xplain uses during analysis. insert_xtable with as_root=True sets the root; subsequent calls attach children.

view = importer.create_xview()

# Root object
view.insert_xtable(
    xtable="Orders",
    as_root=True,
    auto_generate_attributes=True
)

# Child objects (linked via foreign keys declared during import)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.insert_xtable(xtable="Products",  as_root=False, parent="Orders")

# Save the XView as a startup configuration
view.save("orders_analysis", "PUBLIC")

Optional: add numeric range attributes

view.add_attribute_intervals(
    intervals_dict={"Revenue": (0, 100, 500, 1000, 5000, 50000)},
    object_name="Orders",
    attribute_name="_Group"
)
view.save("orders_analysis", "PUBLIC")

Step 5 — Load the XView for analysis

session.startup("orders_analysis")
session.show_tree()

Advanced options

Filter rows during import

Use where_clause to import only the rows you need. The clause is applied server-side so no unnecessary data is transferred.

config = importer.create_xtable_config(
    connection=conn,
    db_table_name="orders",
    object_name="RecentOrders",
    primary_key="order_id",
    foreign_keys=["customer_id"],
    where_clause="order_date >= '2023-01-01' AND status = 'completed'"
)
config.import_xtable()

Import from a SQL expression

Replace the table name with an arbitrary SQL query. Useful for views, joins, or computed columns that do not exist as a plain table.

config = importer.create_xtable_config(
    connection=conn,
    object_name="CustomerRevenue",
    primary_key="customer_id"
)
config.add_db_table_sql_expression("""
    SELECT
        c.customer_id,
        c.country,
        SUM(o.total_amount) AS lifetime_revenue,
        COUNT(o.order_id)   AS order_count
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.country
""")
config.import_xtable()

Add a derived dimension

Use add_dimension_with_sql_expression to compute a new column on the fly without modifying the source database.

config = importer.create_xtable_config(
    connection=conn,
    db_table_name="orders",
    primary_key="order_id"
)
config.add_dimension_with_sql_expression(
    "OrderYear",
    "EXTRACT(YEAR FROM order_date)",
    "LONG"
)
config.import_xtable()

Sample rows for testing

Import a fixed number of rows to validate the configuration before a full run.

config = importer.create_xtable_config(
    connection=conn,
    db_table_name="events",
    primary_key="event_id",
    sampling_rate=10000
)
config.import_xtable()

Import a pandas DataFrame

No database connection needed — pass an in-memory DataFrame directly.

import pandas as pd

df = pd.read_csv("sales_export.csv")

result = importer.import_dataframe(
    dataframe=df,
    object_name="SalesExport",
    primary_key="id",
    type_overrides={"category": "CATEGORIAL", "revenue": "LONG"}
)
print(result)   # {'status': 'success'}

For more control over the DataFrame import (inspect the mapping before committing), use the two-step form:

cfg = importer.create_dataframe_xtable_config(
    dataframe=df,
    object_name="SalesExport",
    primary_key="id"
)
print(cfg.show_dimension_configurations())
cfg.import_xtable()

Add a hierarchical attribute (XAttribute)

XAttributes add a multi-level dimension sourced from a lookup table — for example, a three-level product category hierarchy stored in a product_hierarchy table.

attr_cfg = importer.create_xattribute_config(
    connection_file="warehouse_pg",
    db_table_name="product_hierarchy",
    attribute_name="ProductCategory",
    primary_key="product_id",
    hierarchy_level_names=["Department", "Category", "SubCategory"],
    hierarchy_level_column_names=["dept_name", "cat_name", "subcat_name"]
)
importer.import_xattribute(attr_cfg)

Inspect what is already imported

# List saved connections
print(importer.list_saved_connections())

# List imported XTables
print(importer.list_xtables())

# List saved XTable import configurations
print(importer.list_xtable_configs())

# List imported XAttributes
print(importer.list_xattributes())

# Retrieve the original config used to create a specific XTable
print(importer.get_config_of_xtable("orders.xtable"))

Complete end-to-end example

from xplain import Xsession
from xplain.tools import Importer, XplainType

# 1. Connect to Xplain
session = Xsession(url="http://myhost:8080", user="admin", password="secret")
importer = Importer(session)

# 2. Create and save a database connection
conn = importer.create_connection(
    databaseType="POSTGRESQL",
    url="db.example.com",
    user="analyst",
    password="dbpassword",
    databaseName="warehouse",
    portNumber=5432
)
conn.test_connection()
conn.save("warehouse_pg", "PUBLIC")

# 3. Import root XTable
orders = importer.create_xtable_config(
    connection=conn,
    db_table_name="orders",
    object_name="Orders",
    primary_key="order_id",
    foreign_keys=["customer_id", "product_id"],
    where_clause="order_date >= '2022-01-01'"
)
orders.set_dimension_name("total_amount", "Revenue")
orders.set_time_dimension("order_date")
orders.import_xtable()

# 4. Import child XTables
importer.create_xtable_config(
    connection=conn,
    db_table_name="customers",
    object_name="Customers",
    primary_key="customer_id"
).import_xtable()

importer.create_xtable_config(
    connection=conn,
    db_table_name="products",
    object_name="Products",
    primary_key="product_id"
).import_xtable()

# 5. Build and save the XView
view = importer.create_xview()
view.insert_xtable(xtable="Orders", as_root=True, auto_generate_attributes=True)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.insert_xtable(xtable="Products",  as_root=False, parent="Orders")
view.save("orders_analysis", "PUBLIC")

# 6. Load and start analysing
session.startup("orders_analysis")
session.show_tree()