Chatbot Models

Chatbots could be classified into various categories based on several criteria, here we classify chatbots according to design techniques used in commercial platform.

Retrieval-based Model:

In this Models, Chatbots are trained on a set of examples and their possible outcomes. For every question, the bot finds the most relevant answers from all possible answers set. Although, the bot cannot generate new answers, it can performs fairly good if it was trained on a suitable dataset size ,and this data-set was complete and pre-processed smartly. The used algorithms can range from simple algorithm (like keyword matching) to complex one like machine learning algorithms. Also, there is no issue with language and grammar as the answers are pre-determined and it cannot be wrong syntactically.

Ex.: Dialogflow from Google, Watson from IBM, ...etc

Generative Model:

Generative Model (Deep Learning Based) is Suitable for building open-domain chatbots

Ex.: BlenderBot from facebook, Woebot: AI for mental health, ...etc

Knowledge-base and understanding Model

Platforms that work with Knowledge-base and understanding model is the most accurate, and efficient chatbots. They are suitable for building enterprise chatbots. Once integration between these technologies and deep learning (DL) happen, they will become the dominant one over the chatbot services.

Ex.: Akhwarizmi

Major NLP Engine in Alkhwarizmi

Morphological analyzer & generator, Spell checker, Parser.

Morphological analyzer & generator:

The Morphological analyzer&generator is a vital component of almost every NLP processor.

Analyzer reduces the words into their common form.

Generator works in the opposite direction, generating words in their final inflected form as they appear in the running texts.

The analyzer&Generator Engine.

Based on this equation :

WordToken = Prefix + ( stem || Irregular stem ) + Suffix

Alkhwarizmi developed its analyzer&generator engine ,in accordance with the idea of stripping and concatination ,so that :

Stemmer = word token stripping

Generator = affix - concatenation

To reduce a word token back to its stem ,the stemmer strips it of all prefixes and suffixes in order to bring it back to its original uninflected form .The reverse operation is performed by the generator ,which concatenates the erased affixes back to the stripped stem so that an ultimate word is produced with full morphological representation .

Spell checker:

Employing the results of our Morphological analyzer, Our Speller is designed to offer the user the appropriate choices for the corrected spelling of a misspelled word in the order of their likelihood.

Parser :

The parser is the most critical part of a Natural Language Processing system.

It decomposes the input sentence into its syntactical constituents.

It resolves different types of ambiguity, namely those ambiguities related to syntax, parts of speech, and word sense.

A parser is typically composed of three fundamental parts.

The Parsing Engine.

A distinguishing feature of our parsing engine ,which makes it distinct from other parsing engines available ,is its reliance on a unique control strategy and an innovative preferential scheme.

The Control strategy combines MultiStack parser & deterministic parser, which reduces the over-generation resulting from the MultiStack parser and prevents the structural determinism caused by the deterministic parser.

The preferential scheme is applied to further restrict the number of parsing results. This preferential scheme can be applied either during structure formation or after the parsing has been completed.

The Formal Grammar .

A distinguishing feature of our parsing engine ,which makes it distinct from other parsing engines available ,is its reliance on a unique control strategy and an innovative preferential scheme.

The Lexical DataBase.

To embed multi-linguality at the lexical level, generic data structures universal to all languages have been used. Our lexical database does not handle lexical entries as separate stand-alone items, but as an intricate, entangled forest of interrelations. It draws relations between verbs and their derivatives, synonym relations, etc.

Knowledge processing system

Knowledge processing system combines knowledge representation (ontology) and reasoning methods with knowledge Graph for acquiring knowledge.


Ontology is semantic data model, or knowledge representation medium in which we store knowledge.

There are three main components to an ontology :

  • Entities
  • Ontology Tree
  • Facts


  • Classes (Concepts)
  • Individuals
  • Relations (Verbs)
  • Frames
  • Properties (Concepts)

Ontology Tree


Fact is unit of knowledge, any Property reatled to Concept or Frame and it's value is a fact. Frame itself is fact.

Fact (semantic triple) = Property + (Concept or Frame) + Value

Knowledge Graph Engine:

Knowledge Graph is a type of knowledge base. it has has been introduced by Google. and it was used to enhance its search engine's results and to answer direct spoken questions in Google Assistant.

the figure shown is the info panel which is presented to users in an infobox next to the Google search results with information gathered from a variety of sources.

Knowledge Graph depends on graph theory (mathematics), where entities represtented as nodes, and properties as edges.

In alkhwarizmi we convert our ontology to graph and procees query over graph to answer user inquery.

Machine Learning

Machine Learning becam a crucial tool in almost all NLP tasks in the last few years. In our platform Machine Learning contributes mainly in two tasks: Intent retrieving (Utterance Classification and Utterance Similarity ) and Utterance Clusturing. And it gives really competitive results.

Utterance Classification:

The main challenge of chatbots platforms is predicting the intention of the input utterance text. Utterance Classification algorithms helps in that challenge by approximating the user utterance to one of well tagged intent examples.

Alkhawarizmi platform uses different number of strong lexical and grammatical features to prform a very accurate and flexible utterance classifier. This classifier considers different aspects and covers a wide stream of how people express their intents and ideas.

Utterance Similarity:

Document similarity is a method to say that how much two texts are similar to each other. For example, if one said "I want pizza" and another said "I want tea". For example if a third person came and said "I want to eat pizza" we will say that this person has the same intent like the first one by 90% and the second one by 70% because his words are similar to the first text more than the second one. This simple intuation is the base of the "Document Similarity". Of course this task in not simple like that but it still on the same line. In the "Utterance similarity" the length ratio between two texts, the existance of entities and entities order matters. Implementing a good and accurate "Utterance similarity" algorithms enables Alkhawarizmi platform to predict the intention of the user from the examples database. And as more was this examples database comperehensive and various, the intent prediction result will be more accurate. .

Utterance Clusturing:

Analysing live chats and customer service chats is a basic step in automating these processes and making them more organized and informative. For this purpose, our platform includes a very poweful and accurate machine learning utterance clustering.

This Utterence clustering system uses the datasets extracted from customer service chats and any other humaniod interactive processes between any agent and the bussiness clients. And cluster these chats to a number of categories that have meaningful concept to the bussiness.

With the help of our utterance clustering tool bussiness owners will have the ability to tackle new and unplanned intentions of their customer. This should gurentee faster and strong development for different business areas.

Logic Programing

Language is our ideas and logic screen. So, to make bots talk properly we should inject them with a tool for logical thinking and knowledge representation. Logic programming is the branch of Artificial intelligent that enables machines to represent knowledge and reason logically.

First-order logic:

The first order logic is a way for representing facts and relations. This way of representation give you a mean to judge other facts if they are true or false. Also, it helps machines to deduce new facts and relations.


Prolog is the most popular logic programming language. Here facts and rules express logic in different domains. Prolog runs queries over relations in order to get new facts and rules.

© 2020 AlKhwarizmi . All Rights Reserved.

AlKhwarizmi is a product of ASAS ALQARAR.


AlKhwarizmi is priced in different plans and is available in cloud or on-premises.

Request a Demo