Data Import Tools Reference
The xplain.tools module provides a comprehensive toolkit for importing data
from external databases into the Xplain platform. It supports the full import
workflow: connecting to databases, configuring table imports, building data views,
and generating batch processing scripts.
Note
All tools accept an XplainSession (recommended) or the legacy
Xsession as the xsession parameter. When using
XplainSession, you can also reach the high-level importer directly via
xp.data — see XplainSession.
Import Workflow Overview
A typical import workflow follows these steps:
Create a Connection - Configure database access
Create XTable Configs - Define how tables are imported
Import XTables - Execute the import
Create an XView - Define the object hierarchy
Load the View - Start analyzing data
from xplain import create_session
from xplain.tools import Connection, Xtable_config, Xview
session = create_session() # reads credentials from ~/.xplainpyrc or env vars
# Step 1: Connect to database
conn = Connection(
xsession=session,
databaseType="POSTGRESQL",
url="db.example.com",
user="analyst",
password="secret",
databaseName="warehouse",
portNumber=5432
)
conn.test_connection()
# Step 2: Configure table imports
orders_config = Xtable_config(
xsession=session,
connection=conn,
db_table_name="orders",
object_name="Orders",
primary_key="order_id",
foreign_keys=["customer_id"]
)
customers_config = Xtable_config(
xsession=session,
connection=conn,
db_table_name="customers",
object_name="Customers",
primary_key="customer_id"
)
# Step 3: Import
orders_config.import_xtable()
customers_config.import_xtable()
# Step 4: Create view
view = Xview(session)
view.insert_xtable(xtable="Orders", as_root=True, auto_generate_attributes=True)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.save("sales_view", "PUBLIC")
# Step 5: Load and analyze
session.startup_from_xview_config(view)
session.show_tree()
Using the Importer
The Importer class provides a higher-level interface:
from xplain.tools.importer import Importer
importer = Importer(session)
# Or via the session shortcut:
importer = session.get_importer()
# List existing resources
connections = importer.list_saved_connections()
xtables = importer.list_xtables()
configs = importer.list_xtable_configs()
# Create connection
conn = importer.create_connection(
databaseType="MYSQL",
url="localhost",
user="root",
password="password",
databaseName="mydb"
)
# Create and import XTable
config = importer.create_xtable_config(
connection=conn,
db_table_name="sales",
primary_key="id"
)
importer.import_xtable(config)
# Create XView
view = importer.create_xview()
view.insert_xtable(xtable="sales", as_root=True)
view.save("my_view", "PUBLIC")
xplain.tools.Connection
Manages database connections for data import.
Supported Database Types:
POSTGRESQLMYSQLORACLESQLSERVERSNOWFLAKEBIGQUERYJDBC-based databases
Constructor Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
XplainSession or Xsession |
The Xplain session for API communication |
|
str |
Database type identifier (e.g., |
|
str |
Database server hostname or IP |
|
str |
Database username |
|
str |
Database password |
|
str |
Database name to connect to |
|
str |
Database schema (optional) |
|
int |
Database server port |
|
dict |
Additional JDBC driver properties |
|
str |
Path to load an existing connection file |
|
str |
Path to save the connection after creation |
Methods:
Method |
Description |
|---|---|
|
Test if the connection is valid. Returns |
|
List available database catalogs |
|
List available database schemas |
|
List tables matching criteria (supports |
|
Preview table data as DataFrame |
|
List available JDBC driver properties |
|
Get currently configured driver properties |
|
Add/update a driver property |
|
Remove a driver property |
|
Save configuration to file |
|
Load configuration from file |
|
Convert to JSON dictionary |
Example - Exploring a Database:
conn = Connection(
xsession=session,
databaseType="POSTGRESQL",
url="localhost",
user="admin",
password="secret",
databaseName="warehouse"
)
# Test connection
result = conn.test_connection()
print(result)
# Explore
schemas = conn.get_schemas()
tables = conn.get_tables(schema_pattern="public")
preview = conn.get_table_content("orders", limit=5)
print(preview)
# Save for reuse
conn.save("warehouse_conn", "PUBLIC")
# Load later
conn2 = Connection(xsession=session, load_from="warehouse_conn")
xplain.tools.DataFrameXtableConfig
Configuration for importing pandas DataFrames as XTables (without database connection).
Unlike Xtable_config which requires a database connection, DataFrameXtableConfig
works directly with in-memory pandas DataFrames. Column types are automatically inferred
from DataFrame dtypes.
Type Inference:
pandas/numpy dtype |
Xplain Type |
Description |
|---|---|---|
|
|
Floating point numbers |
|
|
Large integers |
|
|
Smaller integers |
|
|
Timestamps |
|
|
Text/categorical values |
|
|
Boolean values |
Constructor Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
XplainSession or Xsession |
The Xplain session |
|
DataFrame |
The pandas DataFrame to import |
|
str |
Name for the XTable in Xplain |
|
str |
Primary key column name (optional) |
|
list |
List of foreign key column names (optional) |
|
int |
Number of rows to sample (optional) |
Methods:
Method |
Description |
|---|---|
|
Show all dimension mappings as DataFrame |
|
Rename a dimension |
|
Override inferred data type (LONG, CATEGORIAL, INT) |
|
Mark a dimension as time |
|
Unmark a time dimension |
|
Remove a dimension from config |
|
Execute import via JSON payload (POST /create_xtable_csvdata) |
|
Execute import via file upload (POST /upload_csv_create_xtable) |
|
Convert configuration to JSON dictionary |
Class Methods:
Method |
Description |
|---|---|
|
Factory method with optional type_overrides dict |
Example - Basic Usage:
import pandas as pd
from xplain.tools import DataFrameXtableConfig
df = pd.DataFrame({
'id': [1, 2, 3],
'category': ['A', 'B', 'A'],
'amount': [100.5, 200.3, 150.0],
'quantity': [10, 20, 15]
})
config = DataFrameXtableConfig(
xsession=session,
dataframe=df,
object_name="Transactions",
primary_key="id"
)
# View auto-inferred types
print(config.show_dimension_configurations())
# dbColumnName dimensionName xplainDataType primaryKey foreignKey
# 0 id id LONG X
# 1 category category CATEGORIAL
# 2 amount amount LONG
# 3 quantity quantity LONG
# Override type if needed
config.set_dimension_xplain_data_type("quantity", "INT")
# Import
result = config.import_xtable()
print(result) # {'status': 'success', ...}
Example - Using Importer:
from xplain.tools import Importer
importer = Importer(session)
# Create config with type overrides
config = importer.create_dataframe_xtable_config(
dataframe=df,
object_name="Sales",
primary_key="id",
foreign_keys=["customer_id"],
type_overrides={
"category": "CATEGORIAL",
"status_code": "INT"
}
)
config.import_xtable()
# Or import directly in one call
result = importer.import_dataframe(
dataframe=df,
object_name="QuickImport",
primary_key="id"
)
Example - Factory Method:
config = DataFrameXtableConfig.from_dataframe(
xsession=session,
dataframe=df,
object_name="Products",
primary_key="product_id",
type_overrides={
"sku": "CATEGORIAL",
"price": "LONG"
}
)
xplain.tools.Xtable_config
Configuration for importing database tables as XTables.
Constructor Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
XplainSession or Xsession |
The Xplain session |
|
Connection |
Database connection object |
|
str |
Path to saved connection file (alternative to connection) |
|
str |
Source database table name |
|
str |
Name for the XTable in Xplain (defaults to db_table_name) |
|
str |
Primary key column name |
|
list |
List of foreign key column names |
|
str |
SQL WHERE clause for filtering |
|
str |
SQL expression instead of table name |
|
int |
Number of rows to sample |
|
str |
Load existing configuration file |
Methods:
Method |
Description |
|---|---|
|
Show all dimension mappings as DataFrame |
|
Rename a dimension |
|
Set data type (LONG, CATEGORIAL, INT) |
|
Mark a dimension as time |
|
Unmark a time dimension |
|
Remove a dimension from config |
|
Add a computed dimension |
|
Use SQL subquery as data source |
|
Preview source table data |
|
Execute the import |
|
Import from pandas DataFrame |
|
Import from CSV file |
|
Save configuration |
|
Load configuration |
|
Convert to JSON dictionary |
Example - Customizing Dimensions:
config = Xtable_config(
xsession=session,
connection=conn,
db_table_name="sales_data",
object_name="Sales",
primary_key="transaction_id",
foreign_keys=["customer_id"]
)
# View auto-mapped dimensions
print(config.show_dimension_configurations())
# Rename dimensions
config.set_dimension_name("cust_name", "CustomerName")
config.set_dimension_name("prod_cat", "ProductCategory")
# Change data types
config.set_dimension_xplain_data_type("quantity", "LONG")
config.set_dimension_xplain_data_type("category", "CATEGORIAL")
# Mark time dimensions
config.set_time_dimension("order_date")
# Add computed dimension
config.add_dimension_with_sql_expression(
"OrderYear",
"EXTRACT(YEAR FROM order_date)",
"LONG"
)
# Remove unwanted dimensions
config.remove_dimension("internal_notes")
# Filter rows
config.where_clause = "status = 'completed'"
# Import
result = config.import_xtable()
print(result) # {'status': 'success'}
# Save for reuse
config.save("sales_config", "PUBLIC")
xplain.tools.Xview
Builder for XView configurations that define the data view structure.
Methods:
Method |
Description |
|---|---|
|
Add an XTable to the view |
|
Find a node by object name |
|
Add a transformation script |
|
Add interval-based attributes |
|
Add hierarchical interval attributes |
|
Add normalized response dimension |
|
Save the view configuration |
|
Convert to JSON dictionary |
insert_xtable Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
str |
XTable file name (without extension) |
|
str |
Pattern for matching multiple XTable files |
|
str |
Custom object name (defaults to xtable name) |
|
bool |
Set as root of the view (default: True) |
|
str |
Parent object name (required if not root) |
|
bool |
Auto-generate attributes (default: True) |
|
str |
Regex to exclude dimensions from auto-attributes |
|
bool |
All attributes as global scope |
|
bool |
Load XTable into memory |
|
str |
Data loading method |
Example - Building a View:
view = Xview(session)
# Add root table
view.insert_xtable(
xtable="Patients",
as_root=True,
auto_generate_attributes=True
)
# Add child tables
view.insert_xtable(
xtable="Diagnoses",
as_root=False,
parent="Patients"
)
view.insert_xtable(
xtable="Prescriptions",
as_root=False,
parent="Patients"
)
# Add custom attribute intervals
view.add_attribute_intervals(
intervals_dict={"Age": (0, 18, 35, 50, 65, 100)},
object_name="Patients",
attribute_name="_Group"
)
# Add transformation scripts
view.add_script({
"method": "addPointInTimeAttribute",
"object": "Diagnoses",
"dimension": "DiagnosisDate",
"dateTimeLevels": ["YEAR", "QUARTER", "MONTH"]
})
# Save the view
view.save("patient_analysis", "PUBLIC")
# Load the view as a session
session.startup_from_xview_config(view)
xplain.tools.Importer
High-level orchestrator for the import workflow.
Methods:
Method |
Description |
|---|---|
|
Create a database Connection |
|
Create an Xtable_config (database-based) |
|
Create a DataFrameXtableConfig (DataFrame-based) |
|
Import a DataFrame directly as XTable (one-liner) |
|
Create an Xattribute_config for hierarchical attributes |
|
Create an Xview builder |
|
Execute an XTable import |
|
Execute an XAttribute import |
|
List saved connection files |
|
List existing XTable files |
|
List saved XTable configurations |
|
List existing XAttribute files |
|
Get historical import configuration for an XTable |
DataFrame Import Methods:
The Importer provides two ways to import pandas DataFrames:
create_dataframe_xtable_config() - For full control over configuration
import_dataframe() - One-liner convenience method
import pandas as pd
from xplain.tools import Importer
importer = Importer(session)
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'amount': [100.0, 200.0, 150.0]
})
# Method 1: Full control
config = importer.create_dataframe_xtable_config(
dataframe=df,
object_name="Users",
primary_key="id",
type_overrides={"name": "CATEGORIAL"}
)
config.show_dimension_configurations()
config.set_time_dimension("created_at") # further customization
config.import_xtable()
# Method 2: One-liner
result = importer.import_dataframe(
dataframe=df,
object_name="QuickUsers",
primary_key="id",
use_multipart=False # or True for file upload
)
Example - Hierarchical Attributes:
importer = session.get_importer()
# Create a hierarchical attribute (e.g., geography)
attr_config = importer.create_xattribute_config(
connection_file="my_connection",
db_table_name="geography",
attribute_name="Location",
primary_key="location_id",
hierarchy_level_names=["Country", "State", "City"],
hierarchy_level_column_names=["country_name", "state_name", "city_name"]
)
# Import the attribute
importer.import_xattribute(attr_config)
XGenScript - Batch Processing
The xgenscript module provides functions for generating and executing
batch import scripts with time-based partitioning and parallel processing.
Key Functions:
Function |
Description |
|---|---|
|
Generate and execute a single-table pipeline |
|
Generate and execute a multi-table pipeline |
|
Generate script JSON for a single table |
|
Generate script JSON for multiple tables |
|
Execute a generated script |
|
Save a script to file |
Time Helper Functions:
Function |
Description |
|---|---|
|
Generate time range tuples (hourly, daily, monthly, yearly) |
|
Generate SQL WHERE clause for time range |
|
Get date suffix string for file naming |
Example - Single Table Pipeline:
import datetime
from xplain.tools import run_pipeline
script = run_pipeline(
xsession=session,
configuration_file="sales.xtableconfig",
table_name="sales",
start_date=datetime.datetime(2024, 1, 1),
end_date=datetime.datetime(2024, 12, 31),
parent_table="Customers.xtable",
interval="monthly",
time_column="order_date",
records_per_package=500000,
max_task_threads=3,
execute=True
)
Example - Multi-Table Pipeline:
import datetime
from xplain.tools import TableConfig, run_pipeline_multi
configs = [
TableConfig(
configuration_file="orders.xtableconfig",
table_name="Orders",
parent_table="Customers.xtable"
),
TableConfig(
configuration_file="products.xtableconfig",
table_name="Products",
additional_where_clause="active = true"
),
TableConfig(
configuration_file="metrics.xtableconfig",
table_name="Metrics",
time_column_type="epoch_ms" # For LONG epoch milliseconds
)
]
script = run_pipeline_multi(
xsession=session,
table_configs=configs,
start_date=datetime.datetime(2024, 1, 1),
end_date=datetime.datetime(2024, 12, 31),
interval="daily",
time_column="timestamp",
execute=True
)
Constants:
Constant |
Description |
|---|---|
|
Default records per package (500,000) |
|
Default parallel threads (3) |
|
Mapping of interval names to date suffix lengths |
|
Ordered list of fusion levels (daily, monthly, yearly) |