Data Import Tools Reference
The xplain.tools module provides a comprehensive toolkit for importing data
from external databases into the Xplain platform. It supports the full import
workflow: connecting to databases, configuring table imports, building data views,
and generating batch processing scripts.
Import Workflow Overview
A typical import workflow follows these steps:
Create a Connection - Configure database access
Create XTable Configs - Define how tables are imported
Import XTables - Execute the import
Create an XView - Define the object hierarchy
Load the View - Start analyzing data
from xplain import Xsession
from xplain.tools import Connection, Xtable_config, Xview
session = Xsession(url="http://myhost:8080", user="admin", password="secret")
# Step 1: Connect to database
conn = Connection(
xsession=session,
databaseType="POSTGRESQL",
url="db.example.com",
user="analyst",
password="secret",
databaseName="warehouse",
portNumber=5432
)
conn.test_connection()
# Step 2: Configure table imports
orders_config = Xtable_config(
xsession=session,
connection=conn,
db_table_name="orders",
object_name="Orders",
primary_key="order_id",
foreign_keys=["customer_id"]
)
customers_config = Xtable_config(
xsession=session,
connection=conn,
db_table_name="customers",
object_name="Customers",
primary_key="customer_id"
)
# Step 3: Import
orders_config.import_xtable()
customers_config.import_xtable()
# Step 4: Create view
view = Xview(session)
view.insert_xtable(xtable="Orders", as_root=True, auto_generate_attributes=True)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.save("sales_view", "PUBLIC")
# Step 5: Load and analyze
session.startup_from_xview_config(view)
session.show_tree()
Using the Importer
The Importer class provides a higher-level interface:
from xplain.tools.importer import Importer
importer = Importer(session)
# Or via the session shortcut:
importer = session.get_importer()
# List existing resources
connections = importer.list_saved_connections()
xtables = importer.list_xtables()
configs = importer.list_xtable_configs()
# Create connection
conn = importer.create_connection(
databaseType="MYSQL",
url="localhost",
user="root",
password="password",
databaseName="mydb"
)
# Create and import XTable
config = importer.create_xtable_config(
connection=conn,
db_table_name="sales",
primary_key="id"
)
importer.import_xtable(config)
# Create XView
view = importer.create_xview()
view.insert_xtable(xtable="sales", as_root=True)
view.save("my_view", "PUBLIC")
xplain.tools.Connection
Manages database connections for data import.
Supported Database Types:
POSTGRESQLMYSQLORACLESQLSERVERSNOWFLAKEBIGQUERYJDBC-based databases
Constructor Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
Xsession |
The Xplain session for API communication |
|
str |
Database type identifier (e.g., |
|
str |
Database server hostname or IP |
|
str |
Database username |
|
str |
Database password |
|
str |
Database name to connect to |
|
str |
Database schema (optional) |
|
int |
Database server port |
|
dict |
Additional JDBC driver properties |
|
str |
Path to load an existing connection file |
|
str |
Path to save the connection after creation |
Methods:
Method |
Description |
|---|---|
|
Test if the connection is valid. Returns |
|
List available database catalogs |
|
List available database schemas |
|
List tables matching criteria (supports |
|
Preview table data as DataFrame |
|
List available JDBC driver properties |
|
Get currently configured driver properties |
|
Add/update a driver property |
|
Remove a driver property |
|
Save configuration to file |
|
Load configuration from file |
|
Convert to JSON dictionary |
Example - Exploring a Database:
conn = Connection(
xsession=session,
databaseType="POSTGRESQL",
url="localhost",
user="admin",
password="secret",
databaseName="warehouse"
)
# Test connection
result = conn.test_connection()
print(result)
# Explore
schemas = conn.get_schemas()
tables = conn.get_tables(schema_pattern="public")
preview = conn.get_table_content("orders", limit=5)
print(preview)
# Save for reuse
conn.save("warehouse_conn", "PUBLIC")
# Load later
conn2 = Connection(xsession=session, load_from="warehouse_conn")
xplain.tools.DataFrameXtableConfig
Configuration for importing pandas DataFrames as XTables (without database connection).
Unlike Xtable_config which requires a database connection, DataFrameXtableConfig
works directly with in-memory pandas DataFrames. Column types are automatically inferred
from DataFrame dtypes.
Type Inference:
pandas/numpy dtype |
Xplain Type |
Description |
|---|---|---|
|
|
Floating point numbers |
|
|
Large integers |
|
|
Smaller integers |
|
|
Timestamps |
|
|
Text/categorical values |
|
|
Boolean values |
Constructor Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
Xsession |
The Xplain session |
|
DataFrame |
The pandas DataFrame to import |
|
str |
Name for the XTable in Xplain |
|
str |
Primary key column name (optional) |
|
list |
List of foreign key column names (optional) |
|
int |
Number of rows to sample (optional) |
Methods:
Method |
Description |
|---|---|
|
Show all dimension mappings as DataFrame |
|
Rename a dimension |
|
Override inferred data type (LONG, CATEGORIAL, INT) |
|
Mark a dimension as time |
|
Unmark a time dimension |
|
Remove a dimension from config |
|
Execute import via JSON payload (POST /create_xtable_csvdata) |
|
Execute import via file upload (POST /upload_csv_create_xtable) |
|
Convert configuration to JSON dictionary |
Class Methods:
Method |
Description |
|---|---|
|
Factory method with optional type_overrides dict |
Example - Basic Usage:
import pandas as pd
from xplain.tools import DataFrameXtableConfig
df = pd.DataFrame({
'id': [1, 2, 3],
'category': ['A', 'B', 'A'],
'amount': [100.5, 200.3, 150.0],
'quantity': [10, 20, 15]
})
config = DataFrameXtableConfig(
xsession=session,
dataframe=df,
object_name="Transactions",
primary_key="id"
)
# View auto-inferred types
print(config.show_dimension_configurations())
# dbColumnName dimensionName xplainDataType primaryKey foreignKey
# 0 id id LONG X
# 1 category category CATEGORIAL
# 2 amount amount LONG
# 3 quantity quantity LONG
# Override type if needed
config.set_dimension_xplain_data_type("quantity", "INT")
# Import
result = config.import_xtable()
print(result) # {'status': 'success', ...}
Example - Using Importer:
from xplain.tools import Importer
importer = Importer(session)
# Create config with type overrides
config = importer.create_dataframe_xtable_config(
dataframe=df,
object_name="Sales",
primary_key="id",
foreign_keys=["customer_id"],
type_overrides={
"category": "CATEGORIAL",
"status_code": "INT"
}
)
config.import_xtable()
# Or import directly in one call
result = importer.import_dataframe(
dataframe=df,
object_name="QuickImport",
primary_key="id"
)
Example - Factory Method:
config = DataFrameXtableConfig.from_dataframe(
xsession=session,
dataframe=df,
object_name="Products",
primary_key="product_id",
type_overrides={
"sku": "CATEGORIAL",
"price": "LONG"
}
)
xplain.tools.Xtable_config
Configuration for importing database tables as XTables.
Constructor Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
Xsession |
The Xplain session |
|
Connection |
Database connection object |
|
str |
Path to saved connection file (alternative to connection) |
|
str |
Source database table name |
|
str |
Name for the XTable in Xplain (defaults to db_table_name) |
|
str |
Primary key column name |
|
list |
List of foreign key column names |
|
str |
SQL WHERE clause for filtering |
|
str |
SQL expression instead of table name |
|
int |
Number of rows to sample |
|
str |
Load existing configuration file |
Methods:
Method |
Description |
|---|---|
|
Show all dimension mappings as DataFrame |
|
Rename a dimension |
|
Set data type (LONG, CATEGORIAL, INT) |
|
Mark a dimension as time |
|
Unmark a time dimension |
|
Remove a dimension from config |
|
Add a computed dimension |
|
Use SQL subquery as data source |
|
Preview source table data |
|
Execute the import |
|
Import from pandas DataFrame |
|
Import from CSV file |
|
Save configuration |
|
Load configuration |
|
Convert to JSON dictionary |
Example - Customizing Dimensions:
config = Xtable_config(
xsession=session,
connection=conn,
db_table_name="sales_data",
object_name="Sales",
primary_key="transaction_id",
foreign_keys=["customer_id"]
)
# View auto-mapped dimensions
print(config.show_dimension_configurations())
# Rename dimensions
config.set_dimension_name("cust_name", "CustomerName")
config.set_dimension_name("prod_cat", "ProductCategory")
# Change data types
config.set_dimension_xplain_data_type("quantity", "LONG")
config.set_dimension_xplain_data_type("category", "CATEGORIAL")
# Mark time dimensions
config.set_time_dimension("order_date")
# Add computed dimension
config.add_dimension_with_sql_expression(
"OrderYear",
"EXTRACT(YEAR FROM order_date)",
"LONG"
)
# Remove unwanted dimensions
config.remove_dimension("internal_notes")
# Filter rows
config.where_clause = "status = 'completed'"
# Import
result = config.import_xtable()
print(result) # {'status': 'success'}
# Save for reuse
config.save("sales_config", "PUBLIC")
xplain.tools.Xview
Builder for XView configurations that define the data view structure.
Methods:
Method |
Description |
|---|---|
|
Add an XTable to the view |
|
Find a node by object name |
|
Add a transformation script |
|
Add interval-based attributes |
|
Add hierarchical interval attributes |
|
Add normalized response dimension |
|
Save the view configuration |
|
Convert to JSON dictionary |
insert_xtable Parameters:
Parameter |
Type |
Description |
|---|---|---|
|
str |
XTable file name (without extension) |
|
str |
Pattern for matching multiple XTable files |
|
str |
Custom object name (defaults to xtable name) |
|
bool |
Set as root of the view (default: True) |
|
str |
Parent object name (required if not root) |
|
bool |
Auto-generate attributes (default: True) |
|
str |
Regex to exclude dimensions from auto-attributes |
|
bool |
All attributes as global scope |
|
bool |
Load XTable into memory |
|
str |
Data loading method |
Example - Building a View:
view = Xview(session)
# Add root table
view.insert_xtable(
xtable="Patients",
as_root=True,
auto_generate_attributes=True
)
# Add child tables
view.insert_xtable(
xtable="Diagnoses",
as_root=False,
parent="Patients"
)
view.insert_xtable(
xtable="Prescriptions",
as_root=False,
parent="Patients"
)
# Add custom attribute intervals
view.add_attribute_intervals(
intervals_dict={"Age": (0, 18, 35, 50, 65, 100)},
object_name="Patients",
attribute_name="_Group"
)
# Add transformation scripts
view.add_script({
"method": "addPointInTimeAttribute",
"object": "Diagnoses",
"dimension": "DiagnosisDate",
"dateTimeLevels": ["YEAR", "QUARTER", "MONTH"]
})
# Save the view
view.save("patient_analysis", "PUBLIC")
# Load the view as a session
session.startup_from_xview_config(view)
xplain.tools.Importer
High-level orchestrator for the import workflow.
Methods:
Method |
Description |
|---|---|
|
Create a database Connection |
|
Create an Xtable_config (database-based) |
|
Create a DataFrameXtableConfig (DataFrame-based) |
|
Import a DataFrame directly as XTable (one-liner) |
|
Create an Xattribute_config for hierarchical attributes |
|
Create an Xview builder |
|
Execute an XTable import |
|
Execute an XAttribute import |
|
List saved connection files |
|
List existing XTable files |
|
List saved XTable configurations |
|
List existing XAttribute files |
|
Get historical import configuration for an XTable |
DataFrame Import Methods:
The Importer provides two ways to import pandas DataFrames:
create_dataframe_xtable_config() - For full control over configuration
import_dataframe() - One-liner convenience method
import pandas as pd
from xplain.tools import Importer
importer = Importer(session)
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'amount': [100.0, 200.0, 150.0]
})
# Method 1: Full control
config = importer.create_dataframe_xtable_config(
dataframe=df,
object_name="Users",
primary_key="id",
type_overrides={"name": "CATEGORIAL"}
)
config.show_dimension_configurations()
config.set_time_dimension("created_at") # further customization
config.import_xtable()
# Method 2: One-liner
result = importer.import_dataframe(
dataframe=df,
object_name="QuickUsers",
primary_key="id",
use_multipart=False # or True for file upload
)
Example - Hierarchical Attributes:
importer = session.get_importer()
# Create a hierarchical attribute (e.g., geography)
attr_config = importer.create_xattribute_config(
connection_file="my_connection",
db_table_name="geography",
attribute_name="Location",
primary_key="location_id",
hierarchy_level_names=["Country", "State", "City"],
hierarchy_level_column_names=["country_name", "state_name", "city_name"]
)
# Import the attribute
importer.import_xattribute(attr_config)
XGenScript - Batch Processing
The xgenscript module provides functions for generating and executing
batch import scripts with time-based partitioning and parallel processing.
Key Functions:
Function |
Description |
|---|---|
|
Generate and execute a single-table pipeline |
|
Generate and execute a multi-table pipeline |
|
Generate script JSON for a single table |
|
Generate script JSON for multiple tables |
|
Execute a generated script |
|
Save a script to file |
Time Helper Functions:
Function |
Description |
|---|---|
|
Generate time range tuples (hourly, daily, monthly, yearly) |
|
Generate SQL WHERE clause for time range |
|
Get date suffix string for file naming |
Example - Single Table Pipeline:
import datetime
from xplain.tools import run_pipeline
script = run_pipeline(
xsession=session,
configuration_file="sales.xtableconfig",
table_name="sales",
start_date=datetime.datetime(2024, 1, 1),
end_date=datetime.datetime(2024, 12, 31),
parent_table="Customers.xtable",
interval="monthly",
time_column="order_date",
records_per_package=500000,
max_task_threads=3,
execute=True
)
Example - Multi-Table Pipeline:
import datetime
from xplain.tools import TableConfig, run_pipeline_multi
configs = [
TableConfig(
configuration_file="orders.xtableconfig",
table_name="Orders",
parent_table="Customers.xtable"
),
TableConfig(
configuration_file="products.xtableconfig",
table_name="Products",
additional_where_clause="active = true"
),
TableConfig(
configuration_file="metrics.xtableconfig",
table_name="Metrics",
time_column_type="epoch_ms" # For LONG epoch milliseconds
)
]
script = run_pipeline_multi(
xsession=session,
table_configs=configs,
start_date=datetime.datetime(2024, 1, 1),
end_date=datetime.datetime(2024, 12, 31),
interval="daily",
time_column="timestamp",
execute=True
)
Constants:
Constant |
Description |
|---|---|
|
Default records per package (500,000) |
|
Default parallel threads (3) |
|
Mapping of interval names to date suffix lengths |
|
Ordered list of fusion levels (daily, monthly, yearly) |