Data Import Tools Reference

The xplain.tools module provides a comprehensive toolkit for importing data from external databases into the Xplain platform. It supports the full import workflow: connecting to databases, configuring table imports, building data views, and generating batch processing scripts.

Import Workflow Overview

A typical import workflow follows these steps:

  1. Create a Connection - Configure database access

  2. Create XTable Configs - Define how tables are imported

  3. Import XTables - Execute the import

  4. Create an XView - Define the object hierarchy

  5. Load the View - Start analyzing data

from xplain import Xsession
from xplain.tools import Connection, Xtable_config, Xview

session = Xsession(url="http://myhost:8080", user="admin", password="secret")

# Step 1: Connect to database
conn = Connection(
    xsession=session,
    databaseType="POSTGRESQL",
    url="db.example.com",
    user="analyst",
    password="secret",
    databaseName="warehouse",
    portNumber=5432
)
conn.test_connection()

# Step 2: Configure table imports
orders_config = Xtable_config(
    xsession=session,
    connection=conn,
    db_table_name="orders",
    object_name="Orders",
    primary_key="order_id",
    foreign_keys=["customer_id"]
)
customers_config = Xtable_config(
    xsession=session,
    connection=conn,
    db_table_name="customers",
    object_name="Customers",
    primary_key="customer_id"
)

# Step 3: Import
orders_config.import_xtable()
customers_config.import_xtable()

# Step 4: Create view
view = Xview(session)
view.insert_xtable(xtable="Orders", as_root=True, auto_generate_attributes=True)
view.insert_xtable(xtable="Customers", as_root=False, parent="Orders")
view.save("sales_view", "PUBLIC")

# Step 5: Load and analyze
session.startup_from_xview_config(view)
session.show_tree()

Using the Importer

The Importer class provides a higher-level interface:

from xplain.tools.importer import Importer

importer = Importer(session)

# Or via the session shortcut:
importer = session.get_importer()

# List existing resources
connections = importer.list_saved_connections()
xtables = importer.list_xtables()
configs = importer.list_xtable_configs()

# Create connection
conn = importer.create_connection(
    databaseType="MYSQL",
    url="localhost",
    user="root",
    password="password",
    databaseName="mydb"
)

# Create and import XTable
config = importer.create_xtable_config(
    connection=conn,
    db_table_name="sales",
    primary_key="id"
)
importer.import_xtable(config)

# Create XView
view = importer.create_xview()
view.insert_xtable(xtable="sales", as_root=True)
view.save("my_view", "PUBLIC")

xplain.tools.Connection

Manages database connections for data import.

Supported Database Types:

  • POSTGRESQL

  • MYSQL

  • ORACLE

  • SQLSERVER

  • SNOWFLAKE

  • BIGQUERY

  • JDBC-based databases

Constructor Parameters:

Parameter

Type

Description

xsession

Xsession

The Xplain session for API communication

databaseType

str

Database type identifier (e.g., POSTGRESQL)

url

str

Database server hostname or IP

user

str

Database username

password

str

Database password

databaseName

str

Database name to connect to

schema

str

Database schema (optional)

portNumber

int

Database server port

driverProperties

dict

Additional JDBC driver properties

load_from

str

Path to load an existing connection file

save_to

str

Path to save the connection after creation

Methods:

Method

Description

test_connection()

Test if the connection is valid. Returns {'status': 'success'} or {'status': 'fail'}

get_catalogs()

List available database catalogs

get_schemas()

List available database schemas

get_tables(catalog, schema_pattern, table_name_pattern)

List tables matching criteria (supports % and _ wildcards)

get_table_content(table_name, limit=10)

Preview table data as DataFrame

get_default_driver_property_values(verbose=False)

List available JDBC driver properties

get_driver_properties()

Get currently configured driver properties

add_driver_property(property_name, value)

Add/update a driver property

remove_driver_property(property_name)

Remove a driver property

save(name, ownership)

Save configuration to file

load(file_name)

Load configuration from file

to_json()

Convert to JSON dictionary

Example - Exploring a Database:

conn = Connection(
    xsession=session,
    databaseType="POSTGRESQL",
    url="localhost",
    user="admin",
    password="secret",
    databaseName="warehouse"
)

# Test connection
result = conn.test_connection()
print(result)

# Explore
schemas = conn.get_schemas()
tables = conn.get_tables(schema_pattern="public")
preview = conn.get_table_content("orders", limit=5)
print(preview)

# Save for reuse
conn.save("warehouse_conn", "PUBLIC")

# Load later
conn2 = Connection(xsession=session, load_from="warehouse_conn")

xplain.tools.DataFrameXtableConfig

Configuration for importing pandas DataFrames as XTables (without database connection).

Unlike Xtable_config which requires a database connection, DataFrameXtableConfig works directly with in-memory pandas DataFrames. Column types are automatically inferred from DataFrame dtypes.

Type Inference:

pandas/numpy dtype

Xplain Type

Description

float32, float64

LONG

Floating point numbers

int64, uint64

LONG

Large integers

int8, int16, int32

INT

Smaller integers

datetime64

LONG

Timestamps

object, string, category

CATEGORIAL

Text/categorical values

bool

CATEGORIAL

Boolean values

Constructor Parameters:

Parameter

Type

Description

xsession

Xsession

The Xplain session

dataframe

DataFrame

The pandas DataFrame to import

object_name

str

Name for the XTable in Xplain

primary_key

str

Primary key column name (optional)

foreign_keys

list

List of foreign key column names (optional)

sampling_rate

int

Number of rows to sample (optional)

Methods:

Method

Description

show_dimension_configurations()

Show all dimension mappings as DataFrame

set_dimension_name(column_name, dimension_name)

Rename a dimension

set_dimension_xplain_data_type(column_name, type)

Override inferred data type (LONG, CATEGORIAL, INT)

set_time_dimension(dimension)

Mark a dimension as time

remove_time_dimension(dimension)

Unmark a time dimension

remove_dimension(dimension)

Remove a dimension from config

import_xtable()

Execute import via JSON payload (POST /create_xtable_csvdata)

import_xtable_multipart()

Execute import via file upload (POST /upload_csv_create_xtable)

to_json()

Convert configuration to JSON dictionary

Class Methods:

Method

Description

from_dataframe(xsession, dataframe, object_name, ...)

Factory method with optional type_overrides dict

Example - Basic Usage:

import pandas as pd
from xplain.tools import DataFrameXtableConfig

df = pd.DataFrame({
    'id': [1, 2, 3],
    'category': ['A', 'B', 'A'],
    'amount': [100.5, 200.3, 150.0],
    'quantity': [10, 20, 15]
})

config = DataFrameXtableConfig(
    xsession=session,
    dataframe=df,
    object_name="Transactions",
    primary_key="id"
)

# View auto-inferred types
print(config.show_dimension_configurations())
#   dbColumnName dimensionName xplainDataType primaryKey foreignKey
# 0           id            id           LONG          X
# 1     category      category     CATEGORIAL
# 2       amount        amount           LONG
# 3     quantity      quantity           LONG

# Override type if needed
config.set_dimension_xplain_data_type("quantity", "INT")

# Import
result = config.import_xtable()
print(result)  # {'status': 'success', ...}

Example - Using Importer:

from xplain.tools import Importer

importer = Importer(session)

# Create config with type overrides
config = importer.create_dataframe_xtable_config(
    dataframe=df,
    object_name="Sales",
    primary_key="id",
    foreign_keys=["customer_id"],
    type_overrides={
        "category": "CATEGORIAL",
        "status_code": "INT"
    }
)
config.import_xtable()

# Or import directly in one call
result = importer.import_dataframe(
    dataframe=df,
    object_name="QuickImport",
    primary_key="id"
)

Example - Factory Method:

config = DataFrameXtableConfig.from_dataframe(
    xsession=session,
    dataframe=df,
    object_name="Products",
    primary_key="product_id",
    type_overrides={
        "sku": "CATEGORIAL",
        "price": "LONG"
    }
)

xplain.tools.Xtable_config

Configuration for importing database tables as XTables.

Constructor Parameters:

Parameter

Type

Description

xsession

Xsession

The Xplain session

connection

Connection

Database connection object

connection_file

str

Path to saved connection file (alternative to connection)

db_table_name

str

Source database table name

object_name

str

Name for the XTable in Xplain (defaults to db_table_name)

primary_key

str

Primary key column name

foreign_keys

list

List of foreign key column names

where_clause

str

SQL WHERE clause for filtering

dbTableSqlExpression

str

SQL expression instead of table name

sampling_rate

int

Number of rows to sample

load_from

str

Load existing configuration file

Methods:

Method

Description

show_dimension_configurations()

Show all dimension mappings as DataFrame

set_dimension_name(db_column_name, dimension_name)

Rename a dimension

set_dimension_xplain_data_type(db_column_name, type)

Set data type (LONG, CATEGORIAL, INT)

set_time_dimension(dimension)

Mark a dimension as time

remove_time_dimension(dimension)

Unmark a time dimension

remove_dimension(dimension)

Remove a dimension from config

add_dimension_with_sql_expression(name, sql, type)

Add a computed dimension

add_db_table_sql_expression(sql)

Use SQL subquery as data source

read_table_content()

Preview source table data

import_xtable()

Execute the import

import_xtable_from_dataframe(df)

Import from pandas DataFrame

import_xtable_from_csv(csv_file)

Import from CSV file

save(file_name, ownership)

Save configuration

load(file_name)

Load configuration

to_json()

Convert to JSON dictionary

Example - Customizing Dimensions:

config = Xtable_config(
    xsession=session,
    connection=conn,
    db_table_name="sales_data",
    object_name="Sales",
    primary_key="transaction_id",
    foreign_keys=["customer_id"]
)

# View auto-mapped dimensions
print(config.show_dimension_configurations())

# Rename dimensions
config.set_dimension_name("cust_name", "CustomerName")
config.set_dimension_name("prod_cat", "ProductCategory")

# Change data types
config.set_dimension_xplain_data_type("quantity", "LONG")
config.set_dimension_xplain_data_type("category", "CATEGORIAL")

# Mark time dimensions
config.set_time_dimension("order_date")

# Add computed dimension
config.add_dimension_with_sql_expression(
    "OrderYear",
    "EXTRACT(YEAR FROM order_date)",
    "LONG"
)

# Remove unwanted dimensions
config.remove_dimension("internal_notes")

# Filter rows
config.where_clause = "status = 'completed'"

# Import
result = config.import_xtable()
print(result)  # {'status': 'success'}

# Save for reuse
config.save("sales_config", "PUBLIC")

xplain.tools.Xview

Builder for XView configurations that define the data view structure.

Methods:

Method

Description

insert_xtable(xtable, as_root, parent, ...)

Add an XTable to the view

find_node(name)

Find a node by object name

add_script(script)

Add a transformation script

add_attribute_intervals(intervals_dict, object_name, attr_name)

Add interval-based attributes

add_attribute_intervals_hierarchical(intervals_dict, ...)

Add hierarchical interval attributes

add_normalized_attribute(xstartup, target_obj, ...)

Add normalized response dimension

save(file_name, ownership)

Save the view configuration

to_json()

Convert to JSON dictionary

insert_xtable Parameters:

Parameter

Type

Description

xtable

str

XTable file name (without extension)

x_table_file_name_pattern

str

Pattern for matching multiple XTable files

object_name

str

Custom object name (defaults to xtable name)

as_root

bool

Set as root of the view (default: True)

parent

str

Parent object name (required if not root)

auto_generate_attributes

bool

Auto-generate attributes (default: True)

auto_attribute_exclusion_pattern

str

Regex to exclude dimensions from auto-attributes

all_global_attributes

bool

All attributes as global scope

load_to_memory

bool

Load XTable into memory

data_load_method

str

Data loading method

Example - Building a View:

view = Xview(session)

# Add root table
view.insert_xtable(
    xtable="Patients",
    as_root=True,
    auto_generate_attributes=True
)

# Add child tables
view.insert_xtable(
    xtable="Diagnoses",
    as_root=False,
    parent="Patients"
)
view.insert_xtable(
    xtable="Prescriptions",
    as_root=False,
    parent="Patients"
)

# Add custom attribute intervals
view.add_attribute_intervals(
    intervals_dict={"Age": (0, 18, 35, 50, 65, 100)},
    object_name="Patients",
    attribute_name="_Group"
)

# Add transformation scripts
view.add_script({
    "method": "addPointInTimeAttribute",
    "object": "Diagnoses",
    "dimension": "DiagnosisDate",
    "dateTimeLevels": ["YEAR", "QUARTER", "MONTH"]
})

# Save the view
view.save("patient_analysis", "PUBLIC")

# Load the view as a session
session.startup_from_xview_config(view)

xplain.tools.Importer

High-level orchestrator for the import workflow.

Methods:

Method

Description

create_connection(...)

Create a database Connection

create_xtable_config(...)

Create an Xtable_config (database-based)

create_dataframe_xtable_config(dataframe, object_name, ...)

Create a DataFrameXtableConfig (DataFrame-based)

import_dataframe(dataframe, object_name, ...)

Import a DataFrame directly as XTable (one-liner)

create_xattribute_config(...)

Create an Xattribute_config for hierarchical attributes

create_xview()

Create an Xview builder

import_xtable(xtable_config)

Execute an XTable import

import_xattribute(xattribute_config)

Execute an XAttribute import

list_saved_connections()

List saved connection files

list_xtables()

List existing XTable files

list_xtable_configs()

List saved XTable configurations

list_xattributes()

List existing XAttribute files

get_config_of_xtable(xtable_file_name)

Get historical import configuration for an XTable

DataFrame Import Methods:

The Importer provides two ways to import pandas DataFrames:

  1. create_dataframe_xtable_config() - For full control over configuration

  2. import_dataframe() - One-liner convenience method

import pandas as pd
from xplain.tools import Importer

importer = Importer(session)

df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'amount': [100.0, 200.0, 150.0]
})

# Method 1: Full control
config = importer.create_dataframe_xtable_config(
    dataframe=df,
    object_name="Users",
    primary_key="id",
    type_overrides={"name": "CATEGORIAL"}
)
config.show_dimension_configurations()
config.set_time_dimension("created_at")  # further customization
config.import_xtable()

# Method 2: One-liner
result = importer.import_dataframe(
    dataframe=df,
    object_name="QuickUsers",
    primary_key="id",
    use_multipart=False  # or True for file upload
)

Example - Hierarchical Attributes:

importer = session.get_importer()

# Create a hierarchical attribute (e.g., geography)
attr_config = importer.create_xattribute_config(
    connection_file="my_connection",
    db_table_name="geography",
    attribute_name="Location",
    primary_key="location_id",
    hierarchy_level_names=["Country", "State", "City"],
    hierarchy_level_column_names=["country_name", "state_name", "city_name"]
)

# Import the attribute
importer.import_xattribute(attr_config)

XGenScript - Batch Processing

The xgenscript module provides functions for generating and executing batch import scripts with time-based partitioning and parallel processing.

Key Functions:

Function

Description

run_pipeline(xsession, ...)

Generate and execute a single-table pipeline

run_pipeline_multi(xsession, table_configs, ...)

Generate and execute a multi-table pipeline

generate_xgenscript(...)

Generate script JSON for a single table

generate_xgenscript_multi(table_configs, ...)

Generate script JSON for multiple tables

execute_xgenscript(xsession, script)

Execute a generated script

save_xgenscript(xsession, script, filename)

Save a script to file

Time Helper Functions:

Function

Description

generate_time_ranges(start, end, interval)

Generate time range tuples (hourly, daily, monthly, yearly)

where_clause_time(start, end, time_column, time_column_type)

Generate SQL WHERE clause for time range

get_date_suffix(dt, interval)

Get date suffix string for file naming

Example - Single Table Pipeline:

import datetime
from xplain.tools import run_pipeline

script = run_pipeline(
    xsession=session,
    configuration_file="sales.xtableconfig",
    table_name="sales",
    start_date=datetime.datetime(2024, 1, 1),
    end_date=datetime.datetime(2024, 12, 31),
    parent_table="Customers.xtable",
    interval="monthly",
    time_column="order_date",
    records_per_package=500000,
    max_task_threads=3,
    execute=True
)

Example - Multi-Table Pipeline:

import datetime
from xplain.tools import TableConfig, run_pipeline_multi

configs = [
    TableConfig(
        configuration_file="orders.xtableconfig",
        table_name="Orders",
        parent_table="Customers.xtable"
    ),
    TableConfig(
        configuration_file="products.xtableconfig",
        table_name="Products",
        additional_where_clause="active = true"
    ),
    TableConfig(
        configuration_file="metrics.xtableconfig",
        table_name="Metrics",
        time_column_type="epoch_ms"  # For LONG epoch milliseconds
    )
]

script = run_pipeline_multi(
    xsession=session,
    table_configs=configs,
    start_date=datetime.datetime(2024, 1, 1),
    end_date=datetime.datetime(2024, 12, 31),
    interval="daily",
    time_column="timestamp",
    execute=True
)

Constants:

Constant

Description

DEFAULT_RECORDS_PER_PACKAGE

Default records per package (500,000)

DEFAULT_MAX_TASK_THREADS

Default parallel threads (3)

INTERVAL_TO_LENGTH

Mapping of interval names to date suffix lengths

FUSION_LEVELS

Ordered list of fusion levels (daily, monthly, yearly)