Data Import Tools Reference

The xplain.tools module provides a comprehensive toolkit for importing data from external databases into the Xplain platform. It supports the full import workflow: connecting to databases, configuring table imports, building data views, and generating batch processing scripts.

Note

All tools accept an XplainSession (recommended) or the legacy Xsession as the xsession parameter. When using XplainSession, you can also reach the high-level importer directly via xp.data — see XplainSession.

Import Workflow Overview

A typical import workflow follows these steps:

Create a Connection - Configure database access
Create XTable Configs - Define how tables are imported
Import XTables - Execute the import
Create an XView - Define the object hierarchy
Load the View - Start analyzing data

from xplain import create_session
from xplain.tools import Connection, Xtable_config, Xview

session = create_session()  # reads credentials from ~/.xplainpyrc or env vars

# Step 1: Connect to database
conn = Connection(
    xsession=session,
    databaseType="POSTGRESQL",
    url="db.example.com",
    user="analyst",
    password="secret",
    databaseName="warehouse",
    portNumber=5432
)
conn.test_connection()

# Step 2: Configure table imports
orders_config = Xtable_config(
    xsession=session,
    connection=conn,
    db_table_name="orders",
    object_name="Orders",
    primary_key="order_id",
    foreign_keys=["customer_id"]
)
customers_config = Xtable_config(
    xsession=session,
    connection=conn,
    db_table_name="customers",
    object_name="Customers",
    primary_key="customer_id"
)

# Step 3: Import
orders_config.import_xtable()
customers_config.import_xtable()

# Step 4: Create view
view = Xview(session)
view.insert_xtable(xtable="Orders", as_root=True, auto_generate_attributes=True)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.save("sales_view", "PUBLIC")

# Step 5: Load and analyze
session.startup_from_xview_config(view)
session.show_tree()

Using the Importer

The Importer class provides a higher-level interface:

from xplain.tools.importer import Importer

importer = Importer(session)

# Or via the session shortcut:
importer = session.get_importer()

# List existing resources
connections = importer.list_saved_connections()
xtables = importer.list_xtables()
configs = importer.list_xtable_configs()

# Create connection
conn = importer.create_connection(
    databaseType="MYSQL",
    url="localhost",
    user="root",
    password="password",
    databaseName="mydb"
)

# Create and import XTable
config = importer.create_xtable_config(
    connection=conn,
    db_table_name="sales",
    primary_key="id"
)
importer.import_xtable(config)

# Create XView
view = importer.create_xview()
view.insert_xtable(xtable="sales", as_root=True)
view.save("my_view", "PUBLIC")

xplain.tools.Connection

Manages database connections for data import.

Supported Database Types:

POSTGRESQL
MYSQL
ORACLE
SQLSERVER
SNOWFLAKE
BIGQUERY
JDBC-based databases

Constructor Parameters:

Parameter	Type	Description
`xsession`	XplainSession or Xsession	The Xplain session for API communication
`databaseType`	str	Database type identifier (e.g., `POSTGRESQL`)
`url`	str	Database server hostname or IP
`user`	str	Database username
`password`	str	Database password
`databaseName`	str	Database name to connect to
`schema`	str	Database schema (optional)
`portNumber`	int	Database server port
`driverProperties`	dict	Additional JDBC driver properties
`load_from`	str	Path to load an existing connection file
`save_to`	str	Path to save the connection after creation

Methods:

Method	Description
`test_connection()`	Test if the connection is valid. Returns `{'status': 'success'}` or `{'status': 'fail'}`
`get_catalogs()`	List available database catalogs
`get_schemas()`	List available database schemas
`get_tables(catalog, schema_pattern, table_name_pattern)`	List tables matching criteria (supports `%` and `_` wildcards)
`get_table_content(table_name, limit=10)`	Preview table data as DataFrame
`get_default_driver_property_values(verbose=False)`	List available JDBC driver properties
`get_driver_properties()`	Get currently configured driver properties
`add_driver_property(property_name, value)`	Add/update a driver property
`remove_driver_property(property_name)`	Remove a driver property
`save(name, ownership)`	Save configuration to file
`load(file_name)`	Load configuration from file
`to_json()`	Convert to JSON dictionary

Example - Exploring a Database:

conn = Connection(
    xsession=session,
    databaseType="POSTGRESQL",
    url="localhost",
    user="admin",
    password="secret",
    databaseName="warehouse"
)

# Test connection
result = conn.test_connection()
print(result)

# Explore
schemas = conn.get_schemas()
tables = conn.get_tables(schema_pattern="public")
preview = conn.get_table_content("orders", limit=5)
print(preview)

# Save for reuse
conn.save("warehouse_conn", "PUBLIC")

# Load later
conn2 = Connection(xsession=session, load_from="warehouse_conn")

xplain.tools.DataFrameXtableConfig

Configuration for importing pandas DataFrames as XTables (without database connection).

Unlike Xtable_config which requires a database connection, DataFrameXtableConfig works directly with in-memory pandas DataFrames. Column types are automatically inferred from DataFrame dtypes.

Type Inference:

pandas/numpy dtype	Xplain Type	Description
`float32`, `float64`	`LONG`	Floating point numbers
`int64`, `uint64`	`LONG`	Large integers
`int8`, `int16`, `int32`	`INT`	Smaller integers
`datetime64`	`LONG`	Timestamps
`object`, `string`, `category`	`CATEGORIAL`	Text/categorical values
`bool`	`CATEGORIAL`	Boolean values

Constructor Parameters:

Parameter	Type	Description
`xsession`	XplainSession or Xsession	The Xplain session
`dataframe`	DataFrame	The pandas DataFrame to import
`object_name`	str	Name for the XTable in Xplain
`primary_key`	str	Primary key column name (optional)
`foreign_keys`	list	List of foreign key column names (optional)
`sampling_rate`	int	Number of rows to sample (optional)

Methods:

Method	Description
`show_dimension_configurations()`	Show all dimension mappings as DataFrame
`set_dimension_name(column_name, dimension_name)`	Rename a dimension
`set_dimension_xplain_data_type(column_name, type)`	Override inferred data type (LONG, CATEGORIAL, INT)
`set_time_dimension(dimension)`	Mark a dimension as time
`remove_time_dimension(dimension)`	Unmark a time dimension
`remove_dimension(dimension)`	Remove a dimension from config
`import_xtable()`	Execute import via JSON payload (POST /create_xtable_csvdata)
`import_xtable_multipart()`	Execute import via file upload (POST /upload_csv_create_xtable)
`to_json()`	Convert configuration to JSON dictionary

Class Methods:

Method	Description
`from_dataframe(xsession, dataframe, object_name, ...)`	Factory method with optional type_overrides dict

Example - Basic Usage:

import pandas as pd
from xplain.tools import DataFrameXtableConfig

df = pd.DataFrame({
    'id': [1, 2, 3],
    'category': ['A', 'B', 'A'],
    'amount': [100.5, 200.3, 150.0],
    'quantity': [10, 20, 15]
})

config = DataFrameXtableConfig(
    xsession=session,
    dataframe=df,
    object_name="Transactions",
    primary_key="id"
)

# View auto-inferred types
print(config.show_dimension_configurations())
#   dbColumnName dimensionName xplainDataType primaryKey foreignKey
# 0           id            id           LONG          X
# 1     category      category     CATEGORIAL
# 2       amount        amount           LONG
# 3     quantity      quantity           LONG

# Override type if needed
config.set_dimension_xplain_data_type("quantity", "INT")

# Import
result = config.import_xtable()
print(result)  # {'status': 'success', ...}

Example - Using Importer:

from xplain.tools import Importer

importer = Importer(session)

# Create config with type overrides
config = importer.create_dataframe_xtable_config(
    dataframe=df,
    object_name="Sales",
    primary_key="id",
    foreign_keys=["customer_id"],
    type_overrides={
        "category": "CATEGORIAL",
        "status_code": "INT"
    }
)
config.import_xtable()

# Or import directly in one call
result = importer.import_dataframe(
    dataframe=df,
    object_name="QuickImport",
    primary_key="id"
)

Example - Factory Method:

config = DataFrameXtableConfig.from_dataframe(
    xsession=session,
    dataframe=df,
    object_name="Products",
    primary_key="product_id",
    type_overrides={
        "sku": "CATEGORIAL",
        "price": "LONG"
    }
)

xplain.tools.Xtable_config

Configuration for importing database tables as XTables.

Constructor Parameters:

Parameter	Type	Description
`xsession`	XplainSession or Xsession	The Xplain session
`connection`	Connection	Database connection object
`connection_file`	str	Path to saved connection file (alternative to connection)
`db_table_name`	str	Source database table name
`object_name`	str	Name for the XTable in Xplain (defaults to db_table_name)
`primary_key`	str	Primary key column name
`foreign_keys`	list	List of foreign key column names
`where_clause`	str	SQL WHERE clause for filtering
`dbTableSqlExpression`	str	SQL expression instead of table name
`sampling_rate`	int	Number of rows to sample
`load_from`	str	Load existing configuration file

Methods:

Method	Description
`show_dimension_configurations()`	Show all dimension mappings as DataFrame
`set_dimension_name(db_column_name, dimension_name)`	Rename a dimension
`set_dimension_xplain_data_type(db_column_name, type)`	Set data type (LONG, CATEGORIAL, INT)
`set_time_dimension(dimension)`	Mark a dimension as time
`remove_time_dimension(dimension)`	Unmark a time dimension
`remove_dimension(dimension)`	Remove a dimension from config
`add_dimension_with_sql_expression(name, sql, type)`	Add a computed dimension
`add_db_table_sql_expression(sql)`	Use SQL subquery as data source
`read_table_content()`	Preview source table data
`import_xtable()`	Execute the import
`import_xtable_from_dataframe(df)`	Import from pandas DataFrame
`import_xtable_from_csv(csv_file)`	Import from CSV file
`save(file_name, ownership)`	Save configuration
`load(file_name)`	Load configuration
`to_json()`	Convert to JSON dictionary

Example - Customizing Dimensions:

config = Xtable_config(
    xsession=session,
    connection=conn,
    db_table_name="sales_data",
    object_name="Sales",
    primary_key="transaction_id",
    foreign_keys=["customer_id"]
)

# View auto-mapped dimensions
print(config.show_dimension_configurations())

# Rename dimensions
config.set_dimension_name("cust_name", "CustomerName")
config.set_dimension_name("prod_cat", "ProductCategory")

# Change data types
config.set_dimension_xplain_data_type("quantity", "LONG")
config.set_dimension_xplain_data_type("category", "CATEGORIAL")

# Mark time dimensions
config.set_time_dimension("order_date")

# Add computed dimension
config.add_dimension_with_sql_expression(
    "OrderYear",
    "EXTRACT(YEAR FROM order_date)",
    "LONG"
)

# Remove unwanted dimensions
config.remove_dimension("internal_notes")

# Filter rows
config.where_clause = "status = 'completed'"

# Import
result = config.import_xtable()
print(result)  # {'status': 'success'}

# Save for reuse
config.save("sales_config", "PUBLIC")

xplain.tools.Xview

Builder for XView configurations that define the data view structure.

Methods:

Method	Description
`insert_xtable(xtable, as_root, parent, ...)`	Add an XTable to the view
`find_node(name)`	Find a node by object name
`add_script(script)`	Add a transformation script
`add_attribute_intervals(intervals_dict, object_name, attr_name)`	Add interval-based attributes
`add_attribute_intervals_hierarchical(intervals_dict, ...)`	Add hierarchical interval attributes
`add_normalized_attribute(xstartup, target_obj, ...)`	Add normalized response dimension
`save(file_name, ownership)`	Save the view configuration
`to_json()`	Convert to JSON dictionary

insert_xtable Parameters:

Parameter	Type	Description
`xtable`	str	XTable file name (without extension)
`x_table_file_name_pattern`	str	Pattern for matching multiple XTable files
`object_name`	str	Custom object name (defaults to xtable name)
`as_root`	bool	Set as root of the view (default: True)
`parent`	str	Parent object name (required if not root)
`auto_generate_attributes`	bool	Auto-generate attributes (default: True)
`auto_attribute_exclusion_pattern`	str	Regex to exclude dimensions from auto-attributes
`all_global_attributes`	bool	All attributes as global scope
`load_to_memory`	bool	Load XTable into memory
`data_load_method`	str	Data loading method

Example - Building a View:

view = Xview(session)

# Add root table
view.insert_xtable(
    xtable="Patients",
    as_root=True,
    auto_generate_attributes=True
)

# Add child tables
view.insert_xtable(
    xtable="Diagnoses",
    as_root=False,
    parent="Patients"
)
view.insert_xtable(
    xtable="Prescriptions",
    as_root=False,
    parent="Patients"
)

# Add custom attribute intervals
view.add_attribute_intervals(
    intervals_dict={"Age": (0, 18, 35, 50, 65, 100)},
    object_name="Patients",
    attribute_name="_Group"
)

# Add transformation scripts
view.add_script({
    "method": "addPointInTimeAttribute",
    "object": "Diagnoses",
    "dimension": "DiagnosisDate",
    "dateTimeLevels": ["YEAR", "QUARTER", "MONTH"]
})

# Save the view
view.save("patient_analysis", "PUBLIC")

# Load the view as a session
session.startup_from_xview_config(view)

xplain.tools.Importer

High-level orchestrator for the import workflow.

Methods:

Method	Description
`create_connection(...)`	Create a database Connection
`create_xtable_config(...)`	Create an Xtable_config (database-based)
`create_dataframe_xtable_config(dataframe, object_name, ...)`	Create a DataFrameXtableConfig (DataFrame-based)
`import_dataframe(dataframe, object_name, ...)`	Import a DataFrame directly as XTable (one-liner)
`create_xattribute_config(...)`	Create an Xattribute_config for hierarchical attributes
`create_xview()`	Create an Xview builder
`import_xtable(xtable_config)`	Execute an XTable import
`import_xattribute(xattribute_config)`	Execute an XAttribute import
`list_saved_connections()`	List saved connection files
`list_xtables()`	List existing XTable files
`list_xtable_configs()`	List saved XTable configurations
`list_xattributes()`	List existing XAttribute files
`get_config_of_xtable(xtable_file_name)`	Get historical import configuration for an XTable

DataFrame Import Methods:

The Importer provides two ways to import pandas DataFrames:

create_dataframe_xtable_config() - For full control over configuration
import_dataframe() - One-liner convenience method

import pandas as pd
from xplain.tools import Importer

importer = Importer(session)

df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'amount': [100.0, 200.0, 150.0]
})

# Method 1: Full control
config = importer.create_dataframe_xtable_config(
    dataframe=df,
    object_name="Users",
    primary_key="id",
    type_overrides={"name": "CATEGORIAL"}
)
config.show_dimension_configurations()
config.set_time_dimension("created_at")  # further customization
config.import_xtable()

# Method 2: One-liner
result = importer.import_dataframe(
    dataframe=df,
    object_name="QuickUsers",
    primary_key="id",
    use_multipart=False  # or True for file upload
)

Example - Hierarchical Attributes:

importer = session.get_importer()

# Create a hierarchical attribute (e.g., geography)
attr_config = importer.create_xattribute_config(
    connection_file="my_connection",
    db_table_name="geography",
    attribute_name="Location",
    primary_key="location_id",
    hierarchy_level_names=["Country", "State", "City"],
    hierarchy_level_column_names=["country_name", "state_name", "city_name"]
)

# Import the attribute
importer.import_xattribute(attr_config)

XGenScript - Batch Processing

The xgenscript module provides functions for generating and executing batch import scripts with time-based partitioning and parallel processing.

Key Functions:

Function	Description
`run_pipeline(xsession, ...)`	Generate and execute a single-table pipeline
`run_pipeline_multi(xsession, table_configs, ...)`	Generate and execute a multi-table pipeline
`generate_xgenscript(...)`	Generate script JSON for a single table
`generate_xgenscript_multi(table_configs, ...)`	Generate script JSON for multiple tables
`execute_xgenscript(xsession, script)`	Execute a generated script
`save_xgenscript(xsession, script, filename)`	Save a script to file

Time Helper Functions:

Function	Description
`generate_time_ranges(start, end, interval)`	Generate time range tuples (hourly, daily, monthly, yearly)
`where_clause_time(start, end, time_column, time_column_type)`	Generate SQL WHERE clause for time range
`get_date_suffix(dt, interval)`	Get date suffix string for file naming

Example - Single Table Pipeline:

import datetime
from xplain.tools import run_pipeline

script = run_pipeline(
    xsession=session,
    configuration_file="sales.xtableconfig",
    table_name="sales",
    start_date=datetime.datetime(2024, 1, 1),
    end_date=datetime.datetime(2024, 12, 31),
    parent_table="Customers.xtable",
    interval="monthly",
    time_column="order_date",
    records_per_package=500000,
    max_task_threads=3,
    execute=True
)

Example - Multi-Table Pipeline:

import datetime
from xplain.tools import TableConfig, run_pipeline_multi

configs = [
    TableConfig(
        configuration_file="orders.xtableconfig",
        table_name="Orders",
        parent_table="Customers.xtable"
    ),
    TableConfig(
        configuration_file="products.xtableconfig",
        table_name="Products",
        additional_where_clause="active = true"
    ),
    TableConfig(
        configuration_file="metrics.xtableconfig",
        table_name="Metrics",
        time_column_type="epoch_ms"  # For LONG epoch milliseconds
    )
]

script = run_pipeline_multi(
    xsession=session,
    table_configs=configs,
    start_date=datetime.datetime(2024, 1, 1),
    end_date=datetime.datetime(2024, 12, 31),
    interval="daily",
    time_column="timestamp",
    execute=True
)

Constants:

Constant	Description
`DEFAULT_RECORDS_PER_PACKAGE`	Default records per package (500,000)
`DEFAULT_MAX_TASK_THREADS`	Default parallel threads (3)
`INTERVAL_TO_LENGTH`	Mapping of interval names to date suffix lengths
`FUSION_LEVELS`	Ordered list of fusion levels (daily, monthly, yearly)