Importing Data with XplainPy
This tutorial covers importing data from a relational database into Xplain as XTables and building an XView that defines the object hierarchy for analysis.
On this page
Concepts
Before running any code it helps to understand the four building blocks of a Xplain data import:
Concept |
Description |
|---|---|
Connection |
JDBC credentials for a source database (PostgreSQL, MySQL, Oracle, …). Created once, saved to a file, reused across imports. |
XTable |
A binary fact table produced by importing one database table (or SQL expression). Columns become dimensions; one column is the primary key. |
XAttribute |
A hierarchical dimension added on top of an XTable (e.g. a 3-level geography hierarchy stored in a lookup table). |
XView |
A startup configuration that assembles one or more XTables into an object hierarchy (parent → child), ready to load in XOE or via Python. |
The Importer class is the main entry point. It provides factory methods
for all four building blocks and keeps the session wired to every object it
creates.
Xsession
└── Importer
├── create_connection() → Connection
├── create_xtable_config() → Xtable_config → import_xtable()
├── create_xattribute_config() → Xattribute_config → import_xattribute()
└── create_xview() → Xview → save() → startup()
Prerequisites
xplainpackage installed (pip install xplain)An Xplain server is running and accessible
A source database is reachable from the Xplain server
from xplain import Xsession
from xplain.tools import Importer
session = Xsession(url="http://myhost:8080", user="admin", password="secret")
importer = Importer(session)
Step 1 — Create a database connection
Pass your JDBC credentials to create_connection(). Call test_connection()
immediately to validate them before running a full import.
conn = importer.create_connection(
databaseType="POSTGRESQL",
url="db.example.com",
user="analyst",
password="dbpassword",
databaseName="warehouse",
portNumber=5432
)
result = conn.test_connection()
print(result) # {'status': 'success'} or {'status': 'error', 'message': '…'}
Save the connection for reuse
Saving avoids storing credentials in every script. Future imports can load by name instead of re-entering credentials.
conn.save("warehouse_pg", "PUBLIC")
Load a saved connection
conn = importer.create_connection(load_from="warehouse_pg")
Explore the database schema
# List all tables in the connected database
tables = conn.get_tables()
print(tables)
# Preview the first 10 rows of a table
df = conn.get_table_content("orders", limit=10)
print(df)
Step 2 — Configure and import an XTable
Create an Xtable_config by pointing it at a table. The server
auto-detects column types and populates dimensionConfigurations.
config = importer.create_xtable_config(
connection=conn,
db_table_name="orders",
object_name="Orders",
primary_key="order_id",
foreign_keys=["customer_id", "product_id"]
)
Review the auto-detected mapping
print(config.show_dimension_configurations())
# dbColumnName dimensionName xplainDataType primaryKey foreignKey
# 0 order_id order_id LONG X
# 1 customer_id customer_id LONG X
# 2 product_id product_id LONG X
# 3 order_date order_date CATEGORIAL
# 4 quantity quantity LONG
# 5 total_amount total_amount LONG
Adjust dimension names and types
Column names from the database are used as-is. Override them to match your analytical naming conventions:
from xplain.tools import XplainType
# Rename dimensions
config.set_dimension_name("order_date", "OrderDate")
config.set_dimension_name("total_amount", "Revenue")
# Override types where auto-detection is wrong
config.set_dimension_xplain_data_type("quantity", XplainType.INT.value)
# Mark a date column so Xplain treats it as a time axis
config.set_time_dimension("OrderDate")
# Drop a column that is not needed for analysis
config.remove_dimension("internal_flag")
Import the XTable
result = config.import_xtable()
print(result) # {'status': 'success'}
Save the configuration for later re-imports
config.save("orders_import", "PUBLIC")
# Reload and re-run at any time:
config2 = importer.create_xtable_config(load_from="orders_import")
config2.import_xtable()
Step 3 — Import child tables
Repeat Step 2 for every table that will appear as a child in the object
hierarchy. Use foreign_keys to declare which columns link to parent tables.
# Customers table (root's child)
customers_cfg = importer.create_xtable_config(
connection=conn,
db_table_name="customers",
object_name="Customers",
primary_key="customer_id"
)
customers_cfg.import_xtable()
# Products table (root's child)
products_cfg = importer.create_xtable_config(
connection=conn,
db_table_name="products",
object_name="Products",
primary_key="product_id"
)
products_cfg.import_xtable()
Step 4 — Build an XView
An Xview assembles the XTables into the parent–child object hierarchy
that Xplain uses during analysis. insert_xtable with as_root=True
sets the root; subsequent calls attach children.
view = importer.create_xview()
# Root object
view.insert_xtable(
xtable="Orders",
as_root=True,
auto_generate_attributes=True
)
# Child objects (linked via foreign keys declared during import)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.insert_xtable(xtable="Products", as_root=False, parent="Orders")
# Save the XView as a startup configuration
view.save("orders_analysis", "PUBLIC")
Optional: add numeric range attributes
view.add_attribute_intervals(
intervals_dict={"Revenue": (0, 100, 500, 1000, 5000, 50000)},
object_name="Orders",
attribute_name="_Group"
)
view.save("orders_analysis", "PUBLIC")
Step 5 — Load the XView for analysis
session.startup("orders_analysis")
session.show_tree()
Advanced options
Filter rows during import
Use where_clause to import only the rows you need. The clause is applied
server-side so no unnecessary data is transferred.
config = importer.create_xtable_config(
connection=conn,
db_table_name="orders",
object_name="RecentOrders",
primary_key="order_id",
foreign_keys=["customer_id"],
where_clause="order_date >= '2023-01-01' AND status = 'completed'"
)
config.import_xtable()
Import from a SQL expression
Replace the table name with an arbitrary SQL query. Useful for views, joins, or computed columns that do not exist as a plain table.
config = importer.create_xtable_config(
connection=conn,
object_name="CustomerRevenue",
primary_key="customer_id"
)
config.add_db_table_sql_expression("""
SELECT
c.customer_id,
c.country,
SUM(o.total_amount) AS lifetime_revenue,
COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.country
""")
config.import_xtable()
Add a derived dimension
Use add_dimension_with_sql_expression to compute a new column on the fly
without modifying the source database.
config = importer.create_xtable_config(
connection=conn,
db_table_name="orders",
primary_key="order_id"
)
config.add_dimension_with_sql_expression(
"OrderYear",
"EXTRACT(YEAR FROM order_date)",
"LONG"
)
config.import_xtable()
Sample rows for testing
Import a fixed number of rows to validate the configuration before a full run.
config = importer.create_xtable_config(
connection=conn,
db_table_name="events",
primary_key="event_id",
sampling_rate=10000
)
config.import_xtable()
Import a pandas DataFrame
No database connection needed — pass an in-memory DataFrame directly.
import pandas as pd
df = pd.read_csv("sales_export.csv")
result = importer.import_dataframe(
dataframe=df,
object_name="SalesExport",
primary_key="id",
type_overrides={"category": "CATEGORIAL", "revenue": "LONG"}
)
print(result) # {'status': 'success'}
For more control over the DataFrame import (inspect the mapping before committing), use the two-step form:
cfg = importer.create_dataframe_xtable_config(
dataframe=df,
object_name="SalesExport",
primary_key="id"
)
print(cfg.show_dimension_configurations())
cfg.import_xtable()
Add a hierarchical attribute (XAttribute)
XAttributes add a multi-level dimension sourced from a lookup table —
for example, a three-level product category hierarchy stored in a
product_hierarchy table.
attr_cfg = importer.create_xattribute_config(
connection_file="warehouse_pg",
db_table_name="product_hierarchy",
attribute_name="ProductCategory",
primary_key="product_id",
hierarchy_level_names=["Department", "Category", "SubCategory"],
hierarchy_level_column_names=["dept_name", "cat_name", "subcat_name"]
)
importer.import_xattribute(attr_cfg)
Inspect what is already imported
# List saved connections
print(importer.list_saved_connections())
# List imported XTables
print(importer.list_xtables())
# List saved XTable import configurations
print(importer.list_xtable_configs())
# List imported XAttributes
print(importer.list_xattributes())
# Retrieve the original config used to create a specific XTable
print(importer.get_config_of_xtable("orders.xtable"))
Complete end-to-end example
from xplain import Xsession
from xplain.tools import Importer, XplainType
# 1. Connect to Xplain
session = Xsession(url="http://myhost:8080", user="admin", password="secret")
importer = Importer(session)
# 2. Create and save a database connection
conn = importer.create_connection(
databaseType="POSTGRESQL",
url="db.example.com",
user="analyst",
password="dbpassword",
databaseName="warehouse",
portNumber=5432
)
conn.test_connection()
conn.save("warehouse_pg", "PUBLIC")
# 3. Import root XTable
orders = importer.create_xtable_config(
connection=conn,
db_table_name="orders",
object_name="Orders",
primary_key="order_id",
foreign_keys=["customer_id", "product_id"],
where_clause="order_date >= '2022-01-01'"
)
orders.set_dimension_name("total_amount", "Revenue")
orders.set_time_dimension("order_date")
orders.import_xtable()
# 4. Import child XTables
importer.create_xtable_config(
connection=conn,
db_table_name="customers",
object_name="Customers",
primary_key="customer_id"
).import_xtable()
importer.create_xtable_config(
connection=conn,
db_table_name="products",
object_name="Products",
primary_key="product_id"
).import_xtable()
# 5. Build and save the XView
view = importer.create_xview()
view.insert_xtable(xtable="Orders", as_root=True, auto_generate_attributes=True)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.insert_xtable(xtable="Products", as_root=False, parent="Orders")
view.save("orders_analysis", "PUBLIC")
# 6. Load and start analysing
session.startup("orders_analysis")
session.show_tree()