There is a problem that involves Javascript and regular expressions. The JS implementation of regexp does not support Unicode properly, for example /\b\S+\b/g regular expression will not count words with Unicode characters of many national alphabets and scripts, such as Cyrillic, Greek and Hindi. Unfortunately \S is restricted to Latin-only characters of English alphabet. To solve this problem we must explicitly include all Unicode characters. My solution is to use /([\u0080-\uFFFF\w]\u0027?)+/g regular expression instead. It covers the wide range of Unicode characters (from 0080 to FFFF) that includes all national alphabets + apostrophe symbol (0027). This regex has been tested with the following sample text and it counts all 55 words accurately, ignoring all special characters and punctuation, I used https://regexr.com to test it with this sample text that includes words from several alphabets.
test automation, scripting, system administration, etc.