Introduction to Concept of TwinDoc

Machine learning-based document search and classification solution: TwinDoc

TwinDoc is the only solution in Korea that provides document search, analysis and classification services based on machine learning technology. Unlike traditional document search solutions, we provide services without a “dictionary” that requires constant addition and management. 

The latest algorithms such as TextCNN and BERT can be used to produce accurate results and the new algorithms can be easily applied. 

TextCNN: Deep learning-based NLU (natural language understanding) technology that generates documents in the form of two-dimensional image data based on Word2Vec

BERT: BERT is one of the pre-trained models, and has a knowledge of about 15 years old through written language-centered learning. Currently, it is the world’s most trendy text analysis algorithm, but based on written language, needs to learn a colloquial language.

We have used BERT, which already has learnt spoken language, to conduct financial projects and have proven its performance.

Configuration of TwinDoc

TwinDoc is a machine learning-based document classification and retrieval solution that embeds words or documents, and generates vector values in the embedded space. Therefore, the dictionary is not required, so it saves resources for maintenance and management of dictionary, and also applies the algorithm as it is, enabling easy learning and flexible application of the latest algorithm.

TwinDoc Work Flow

TwinDoc is a machine learning-based solution that enables you to search, analyze and classify documents without the need for dependent dictionaries, providing existing complex services at once.

Unique Features of TwinDoc – Doc’s DETOX

TwinDoc breaks the limits of dictionary management, unscalability, and solution fragmentation that traditional solutions have.

Doc’s DETOX-TwinDoc

Existing Solution vs TwinDoc


