Hoarding is a disorder characterized by difficulty in parting with possessions. It was once considered a symptom of obsessive-compulsive disorder, a mental and behavioral disorder that affects between 2% and 4% of the general population. In 2013, hoarding disorder was classified as a separate condition in the Diagnostic and Statistical Manual of Mental Disorders. Digital hoarding is just “a new version of an old psychological challenge,” Dr. Maidenberg says. With digital hoarding, however, the act of saving the file becomes an uncontrollable urge. [source]
Hoarding of digital documents leads to forgetting what is in the hoard.
Hoarding in shared directories causes security and privacy issues.
Digital theft often occurs with access to digital hoards within organizations.
Use after theft can only be stopped through mass encryption of shared directories.
Most web sites, applications (IoT, mobile, streaming and desktop), plus many operating systems, remotely hoard data.
Governments hoard through backbones and backdoors.
The rate of big data hoarding has been accelerating for years.
It cannot be stopped because of the wealth and control associated with big data and AI.
Most big data cannot be processed due to contraints on data mining which makes the data close to useless as it ages.
Inception rapidly transforms documents and raw data into other forms.
Inception is an automated data pre-processor that can greatly speed up data conversion.
A child could create the processing scripts used to transform data into a desired form.
Inception automates transforming Microsoft Office documents into words that are quickly recorded in a memory-based database.
After completion of processing, results are inserted into a relational database to allow searching of words and phrases to locate documents.
Inception was created and enhanced over many years.
The process of transforming digital data into new forms is as old as the computer industry.
Inception was refactored and largely rewritten in a selective mixture of C and C++.
We could have quickly rewritten Inception in higher-level languages but chose to take the difficult route of coding in C and C++.
We have a great deal of experience developing software in many higher-level languages.
Refactoring Inception in any of the higher-level languages would have saved us a great deal of time.
We developed in C to directly load and process files in memory because it is fastest and provides greatest control.
We did not use C++ memory handling because it makes indirect use of memory which is slower than direct access.
We did not use C++ vectors and other STL functions because they internally control memory allocation, expansion and deallocation which reduces processing speed.
The speed difference between direct and indirect memory access becomes highly noticable when processing millions of documents.
We utilized the C++ object-oriented framework to make it easier to maintain Inception.
We created Inception for speed and control rather than cobble together a series of open source applications.
While making use of open source applications accelerates creation of products and services, it often causes more problems than it solves.
The languages used to create open source applications are often poorly chosen.
Reviewing code bases of many open source products usually show inefficiencies, flawed architecture, design and implementation, and dismal commenting.
The Inception code base is well constructed, well commented and fairly easy to understand.
We can enhance Inception to automate processing of other document types as needed in future.
Inception was refined to achieve the highest processing speed.
The scripting command set has been refreshed and expanded.
A Linux API is provided through use of shared and static libraries to enable custom software to make use of Inception.
Async process threading and automated directory processing has been added.
A memory-based NoSQL engine was created and used to retain processed documents until all data can be retained in a relational database.
We make use of the memory-based NoSQL engine to accelerate the speed of document processing.
Inception automates document transformation.
Files and directory chains that are transferred to an Inception input queue are automatically processed in FIFO sequence, then written to an output queue.
Automatic detection of file type, then document type, results in correct processing.
Processing speed reduces total threads required for processing.
Files and directory chains transferred into an input queue are fully written before they are moved to thread processing space.
This feature to detect open file handles associated to a file or directory chain is critical for automation of document processing.
Without validation, file and directories could be moved while still being written.
This problem was discovered while developing the Kryptera HSM.
Tracing down a solution required R&D that was specific to Linux.
A variable delay is placed after a file or directory chain is completely written.
The delay is only needed if cached writes occur after file writes are complete.
Inception builds and runs on Debian and related trees such as Ubuntu and Devuan, plus RedHat, Centos and Fedora.
Inception will run as an initd or systemd service, a standalone service-like application, or a standalone application to create processing scripts.
Inception shared and static libraries can be used by custom software to automate document and data transformation.
We chose to develop Inception to operate under Linux rather than Windows to ensure the highest processing speed and stability.
Inception service startup flags.
./InceptionApplication -folder_input input/ -folder_output output/ -folder_process process/ -folder_script script/ -folder_error error/ -folder_log log/
This starts Inception as a standalone application which can also be used as a systemd service. -start_daemon can be passed to start Inception as a background initd daemon (service). The design allows shared directories to be used for input and output queues, and allows use of higher-speed storage for thread-processing space.
Inception Script Creation Flags:
./InceptionApplication -test_source path/source_file -test_target path/target_file
-test_script path/script_file
The flags are used to test creation or refinement of a processing script.
Load the source, script and target files into a text editor.
Change the script then start command line processing to update the target for review.
The key feature of Inception lies with the user-defined processing scripts used to automate transformation of documents into desired formats.
The Inception scripting language contains many commands separated into groups:
hide <tag_open>data</tag_open>,
move down till Tag Level == tag_level_to
then start a new block using the passed parameters
making sure to include the extracted data.
A five line script using three commands transforms Word XML documents to text.
EliminateContent|<?xml|<w:t>|1|
SwapStrings|</w:t>| _TE_|
SwapStrings|<w:t>|_TB_|
EliminateContentAll|_TE_|_TB_|1|
EliminateContent| _TE_|</w:document>|1|
A six line script using four commands transforms Excel XML documents to text.
RemoveBetween|<t|>|
EliminateContent|<?xml|<t>|1|
SwapStrings|</t>| _TE_|
SwapStrings|<t>|_TB_|
EliminateContentAll|_TE_|_TB_|1|
EliminateContent| _TE_|</sst>|1|
A six line script using four commands transforms each PowerPoint XML slide to text.
RemoveWithout|<a:t>|<?xml><a:t></a:t></p:sld>|
EliminateContent|<?xml|<a:t>|1|
SwapStrings|</a:t>| _TE_|
SwapStrings|<a:t>|_TB_|
EliminateContentAll|_TE_|_TB_|1|
EliminateContent| _TE_|</p:sld>|1|
Five API functions are provided that enable custom software to automate document transformation.
The functions differ on data passed to and from the API.
API naming is based on the International Civil Aviation Organization (ICAO) alphabet.
/** * \brief Load source and processing script, write target, deallocate source and script memory, caller has target file. * * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath, * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourcePath * \param pTargetPath * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Alpha( char *pSourcePath, char *pTargetPath, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
/** * \brief Load processing script, write target, deallocate script memory, caller has target file and deallocates pSourceBuffer. * * Buffers pProcessingScript, calls Delta, writes pTargetPath, * deallocates pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourceBuffer * \param pTargetPath * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Bravo( uint8_t *pSourceBuffer, char *pTargetPath, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
/** * \brief Load processing script, deallocate script, returns target (pSourceBuffer), caller writes target * and deallocates pSourceBuffer. * * Buffers pProcessingScript, calls Delta, deallocates pProcessingScriptBuffer, returns altered pSourceBuffer. * * \param pSourceBuffer * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Charlie( uint8_t *pSourceBuffer, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
/** * \brief Returns target (pSourceBuffer), caller has target (pSourceBuffer) and deallocates pSourceBuffer * and pProcessingScriptBuffer. * * Delta is the class called by all others. * * \param pSourceBuffer * \param pProcessingScriptBuffer * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \param pMaxSourceBuffer: size of pSourceBuffer when malloc'd * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Delta( uint8_t *pSourceBuffer, uint8_t *pProcessingScriptBuffer, uint32_t *pSizeSourceBuffer, uint64_t pMaxSourceBuffer );
/** * \brief processing script is populated, load source, write target, deallocate source memory, caller has target file. * * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath, * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourcePath * \param pTargetPath * \param pProcessingScriptBuffer * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error **/ int32_t Echo( char *pSourcePath, char *pTargetPath, uint8_t *pProcessingScriptBuffer, uint32_t *pSizeSourceBuffer );
Inception can be deployed on a virtualized computer such as Docker.
Docker containers are easy to create, configure and use.
Docker supports Volumes which enable use of Inception input and output queues.
Kubernetes can be used to control 1-n Docker containers.
Intel Core i3-4150 4 CPU cores @ 3.50 GHz / 800 MHz,
1 TB 7200 RPM SATA HDD, 8 async threads limit.
Inception is designed to fully automate conversion of documents into desired formats.
This includes automated processing of directory chains.
Inception will perform in a straightforward manner 24/7.
None of the competitive products provide similar features.
All products are complex in construct and use.
Inception solves data hoarding issues through rapid automation of document transformation.
Transformed Office documents are parsed, with words separated then later imported into a database for indexing.
Automated processing of shared directory chains ensures a faster return on investment for clients.
Copyright © Kryptera