Linguistic rules
Rule-based pattern matching can be based on simple boolean keywords or more complex models compiled over time by language experts. The linguistic rules can range from identifying parts of speech, syntax, and inflections to rules about different topics, regions, and stylistic variations. This rule-based method can be quickly applied to a set of documents for fast analysis.
Linguistic rules benefits
Fast analysis
The analysis runs quickly (after the rules have been created).
Mistakes are easy to spot
Easy to understand where rules are successful and where they return irrelevant data.
Granular analysis
Text can be broken into smaller chunks for analysis.
Results closely match expectations
Rules-based analysis will find what you’re looking for, but often serves to reinforce initial assumptions instead of challenging them with a broader perspective.
Linguistic rules trade offs
There are always exceptions to rules
Language is variable, constantly changing, and often informal. It is impossible for rules to account for all the ways meaning can be expressed in text. Text analysis based on linguistic rules often misses information that is relevant due to the rigidity of the rules.
Building complex rules can take years
Complex rules based on expert knowledge sometimes require years of research to compile the necessary resources to perform the analysis.
Detailed development for each language
Certain languages that have not been widely studied may not be easily analyzed before extensive research on the unique features of the grammar and vocabulary.
Narrow approach
Rules are created by humans with inherent biases, and will only match patterns which were expected to be found. Discovering trends and new ways of expressing ideas is hampered by the reliance on static resources.