5. Java Interface
The purpose of the Java interface is to extend algorithmic capabilities of the ObjectAnalytics engine.
With this interface, you can inject algorithms deeply into the core of the database engine. Algorithms come to data instead the data to algorithms: no moving tons of data, no expensively transforming data into formats till they can be sent to and processed by an algorithm elsewhere. Algorithms are as close to data as possible allowing for a maximum of performance. All algorithms implemented by Xplain Data itself also go via that interface.
In the section on the technical overview you see that the Java interface is not an “external” interface. It allows to extend the Java core of the database on a source code level. Doing this requires software development skills and also algorithmic skills.
The Java interface is not yet released officially, but still in beta status. If you want to implement some advanced algorithm which you would like to execute massively parallel on millions of stored object instances, then get in contact with the Xplain Data team. We will help you to do this. To encourage you, some information is given below. Detailed documentation is given as soon as the interface is officially released.
Iterating Objects
Typically, statistical algorithms iterate rows in a table to collect the required statistics. As a data scientist, you likely have implemented some algorithms already, and thereby loaded a csv-file or a database table and once or repeatedly cycled through all rows in that table.
Instead of iterating rows in a table, “ObjectAnalytics” means iterating through all objects (all instances of an object) and collecting required data from those object instances.
In typical Java style, you may request an “object instance iterator” and with that cycle through all instances of an object. An object instance may, for example, be an individual patient, with all information attached to the patient, its diagnoses, prescriptions, procedures … . There are means to navigate that object, access sub-object, member fields of objects (dimension and attributes) analyze elements in relation to each other (e.g. diagnoses relative to prescriptions). All information related to a patient is readily available from one object instance - no need to collect data expensively from different tables. This includes artifacts such as relative time axes or aggregation dimensions, which you might have defined to set elements of the object in relation to each other.
Object Map Reduce
Instead of using an “iterator”, however, you will rather want to use our “Object Map Reduce” interface. That interface requires you to specify basically two things (by implementing corresponding methods of the Java interface):
The operation which is to be executed on each single object instance, i.e., how each instance contributes to the statistics which is to be collected.
How results collected on sub-sets of object instances (sub-sets of patients) are summed up to a joint result (the reduce step).
Once you have implemented those required methods of the interface, the object database engine uses this to execute the specified algorithm massively parallel on all cores of a multiprocessor machine. The problem is dynamically split into sub-tasks corresponding to sub-sets of objects, and sub-results are finally reduced to a joint result. You don’t need to care about multi-threaded execution, load balancing between cores - all this is then done for you. You algorithm will immediately execute on a potentially huge machine utilizing all available resources.
Using the Java and Object Map Reduce interface, there are primarily two benefits for you:
You may conveniently access whole objects - readily available without collecting data from different tables (or potentially expensive joins of tables).
Using this framework, all parallel resources of multiprocessor machines will be immediately available to you to scale up your algorithms.