It is a tool that is used to automate transformation of documents into desired formats.
Three file paths and names are passed to Inception:
Inception loads the source and script files into memory, then alters the source data by using each script line in sequence.
The script can contain commands to change, eliminate, put, reduce, set, remove, swap, transfer, and transform the source data.
The target file is written after all script lines have been processed.
Inception has been used in various forms to automate transformation of:
Inception can be directly used as a service or stand alone application, and can also be called using custom software through the Inception API.
If warranted, we can customize Inception for specific use.
Data harvesting will keep increasing for years to come.
Harvested data is mostly stored in non-relational databases in cloud space.
Most harvested data is not used for lack of data mining software, and limited execution speed of mining software.
Private harvested data is increasing stolen, used and sold without permission.
Backing up files can result in a digital hoarding problem.
Information within hoarded documents is often forgotten.
Shared directories containing hoarded documents causes a security and privacy nightmare.
Theft occurs when access is provided to digital hoards within organizations.
Theft is for personal, corporate and government gain, where some theft is to increase personal digital hoards.
Inception and database services execute in the background
The process of transforming digital data into new forms is as old as the computer industry.
Inception was created, recreated, rewritten and perpetually enhanced over many years.
Inception was refactored and largely rewritten in a selective mixture of C and C++.
We have a great deal of experience developing software in several low to high-level languages.
We could have quickly rewritten Inception in higher-level languages.
We chose to take the difficult route of coding in C and C++.
Refactoring Inception in any of the higher-level languages would have saved us a great deal of time and cost.
We could have made use of open-source applications to save us even more time and cost.
Use of open-source applications that are usually written in higher-level languages, could:
Inception was refined to achieve the highest processing speed.
We chose to develop key-portions of Inception in C to directly load and process files in memory because it is fastest and provides greatest control.
We did not use C++ memory handling because it makes indirect use of memory, which is slower than direct access.
Direct memory access is faster in situations where CPU caching is ineffective due to multi-threaded processing.
We did not use C++ vectors and other associated functions.
The speed difference between direct and indirect memory access becomes noticable when processing thousands to millions of documents.
We created Inception for speed, stability and user-control.
We utilized the C++ object-oriented framework to make it easier to isolate functionality and maintain Inception.
The Inception code base is well constructed, well commented and documented, and fairly easy to understand.
We can enhance Inception to automate processing of other document types as needed in future.
Inception automates document transformation.
It does so from within custom software that utilizes the Inception API.
It does so when used as a Linux service that automates document processing.
Files and directory chains transferred into an input queue must be fully written before use.
This feature is critical for document processing automation.
Without validation, file and directories could be moved or loaded into memory while still being written.
This problem was discovered while developing the Kryptera HSM.
Tracing down a solution required low-level R&D that was specific to Linux.
A variable delay is placed after a file or directory chain is completely written.
The delay is only needed if cached writes occur after file writes are complete.
Inception builds and runs on Debian and related trees such as Ubuntu and Devuan, plus RedHat, Centos and Fedora.
It would be fairly simple to refine Inception to build and execute under Unix.
Inception will run as an initd or systemd service, a standalone service-like application, or a standalone application to create processing scripts.
Inception shared and static libraries can be used by custom software to automate document and data transformation.
We chose to develop Inception to operate under Linux rather than Windows to ensure the highest processing speed and stability.
Windows, applications and services utilize memory, CPU cores, file storage and network bandwidth which would reduce Inception processing speed.
Refining Inception to build and execute under Windows will occur in the near future.
Inception service startup flags.
./ServiceInception -folder_input input/ -folder_output output/ -folder_process process/ -folder_script script/ -folder_error error/ -folder_log log/
This starts Inception as a standalone application which can also be used as a systemd service.
-start_daemon is passed to start Inception as a background initd daemon (service). The design allows shared directories to be used for input and output queues, and allows use of higher-speed storage for thread-processing space.
PostgreSQL database microservice startup flags.
./ServicePostgreSQL -folder_input input/ -folder_log log/ -encrypted_auth PATH_FILENAME
Pass -start_daemon to start the service as a background initd daemon. Set -folder_input to what -folder_output is set to for Inception.
A utility is provided to encrypt database-related information and preferences.
Inception Script Creation Flags.
./ServiceInception -test_source path/source_file
-test_target path/target_file
-test_script path/script_file
The flags are used to test creation or refinement of a processing script.
Load the source, script and target files into a text editor.
Change the script then start command line processing to update the target for review.
A key Inception feature is user-defined processing scripts that are used to automate document transformation into desired formats.
The Inception scripting language contains many commands that can be separated into groups:
hide <tag_open>data</tag_open>,
move down till Tag Level == tag_level_to
then start a new block using the passed parameters
making sure to include the extracted data.
Our scripting language can be extended in future to better process any form of data plus other document formats.
We added functionality to automate transformation of Microsoft Office documents into words that are inserted into a relational database.
Five API functions are provided that enable custom software to automate document transformation.
The functions differ on the data that is passed to and from the API.
API naming is based on the International Civil Aviation Organization (ICAO) alphabet.
/** * \brief Load source and processing script, write target, deallocate source and script memory, caller has target file. * * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath, * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourcePath * \param pTargetPath * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Alpha( char *pSourcePath, char *pTargetPath, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
/** * \brief Load processing script, write target, deallocate script memory, caller has target file and deallocates pSourceBuffer. * * Buffers pProcessingScript, calls Delta, writes pTargetPath, * deallocates pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourceBuffer * \param pTargetPath * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Bravo( uint8_t *pSourceBuffer, char *pTargetPath, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
/** * \brief Load processing script, deallocate script, returns target (pSourceBuffer), caller writes target * and deallocates pSourceBuffer. * * Buffers pProcessingScript, calls Delta, deallocates pProcessingScriptBuffer, returns altered pSourceBuffer. * * \param pSourceBuffer * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Charlie( uint8_t *pSourceBuffer, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
/** * \brief Returns target (pSourceBuffer), caller has target (pSourceBuffer) and deallocates pSourceBuffer * and pProcessingScriptBuffer. * * Delta is the class called by all others. * * \param pSourceBuffer * \param pProcessingScriptBuffer * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \param pMaxSourceBuffer: size of pSourceBuffer when malloc'd * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Delta( uint8_t *pSourceBuffer, uint8_t *pProcessingScriptBuffer, uint32_t *pSizeSourceBuffer, uint64_t pMaxSourceBuffer );
/** * \brief processing script is populated, load source, write target, deallocate source memory, caller has target file. * * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath, * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourcePath * \param pTargetPath * \param pProcessingScriptBuffer * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error **/ int32_t Echo( char *pSourcePath, char *pTargetPath, uint8_t *pProcessingScriptBuffer, uint32_t *pSizeSourceBuffer );
Inception can be deployed on a virtualized server such as Docker.
Docker containers are easy to create, configure and use.
Docker supports Volumes which enable use of Inception input and output queues.
Kubernetes can be used to control 1-n Docker containers.
Intel Core i3-4150 4 CPU cores @ 3.50 GHz / 800 MHz,
1 TB 7200 RPM SATA HDD, 8 async threads limit.
Documents transformed into text.
Documents transformed into text, split into words, retained in memory, with memory tables saved as binary files.
PostgreSQL database microservice loading saved tables into memory, then writing results to a relational database.
Current and Future Enhancement.
Alteryx |
ShinyDocs |
OpenRefine |
Trifacta Wrangler |
Drake |
TIBCO Clarity |
Winpure |
Data Ladder |
Quadient Data Cleaner |
Cloudingo |
Reifier |
IBM Infosphere Quality Stage |
Inception is designed to fully automate conversion of documents into desired formats.
Inception solves data hoarding issues through rapid automation of document transformation.
Transformed Office documents are parsed, with words separated then separately imported into a database for indexing.
Automated processing of shared directory chains ensures a faster return on investment for clients.
Copyright © Kryptera