Skip to main content

Accurate Word Counter for non-Latin characters in Javascript regex

There is a problem that involves Javascript and regular expressions. The JS implementation of regexp does not support Unicode properly, for example /\b\S+\b/g regular expression will not count words with Unicode characters of many national alphabets and scripts, such as Cyrillic, Greek and Hindi. Unfortunately \S is restricted to Latin-only characters of English alphabet.

To solve this problem we must explicitly include all Unicode characters. My solution is to use /([\u0080-\uFFFF\w]\u0027?)+/g regular expression instead. It covers the wide range of Unicode characters (from 0080 to FFFF) that includes all national alphabets + apostrophe symbol (0027). This regex has been tested with the following sample text and it counts all 55 words accurately, ignoring all special characters and punctuation, I used https://regexr.com to test it with this sample text that includes words from several alphabets.

Popular posts from this blog

Switching between keyboard layouts in Openbox (Arch Linux)

Switching between two (or more) keyboard layouts in Openbox DE is a task that's quite easy to accomplish, although it might not be so obvious as in other desktop environments. This solution was tested on Arch Linux. You just need to edit this file (assuming you want to switch between English and Ukrainian Phonetic layouts with Alt-Shift): /etc/X11/xorg.conf.d/01-keyboard-layout.conf Section "InputClass" Identifier "keyboard-layout" Driver "evdev" MatchIsKeyboard "yes" Option "XkbLayout" "us,ua(phonetic)" Option "XkbModel" "pc105" Option "XkbOptions" "grp:alt_shift_toggle" EndSection If you have Nvidia card, don't forget to edit /etc/X11/xorg.conf.d/20-nvidia.conf and change Driver from "kbd" to "evdev" in InputDevice section: Section "InputDevice" Identifier "Keyboard0" Driver "evdev" EndSection Y...

How to control CPU frequency on Linux Fedora 20

Sometimes you need to make CPU run faster or slower, depending on your needs. In Linux you can control this by using different CPU governors. The following steps were tested on Fedora 20, but most likely they would also work on later versions of Linux Fedora OS. sudo yum install kernel-tools Check the available CPU governors: sudo cpupower frequency-info --governors sudo cpupower frequency-set --governor [governor] More information about the governors here: http://docs.fedoraproject.org/en-US/Fedora/20/html/Power_Management_Guide/cpufreq_governors.html

Using querySelector in Selenium WebDriver

Sometimes it's very helpful if we are able to click some element directly, by calling JavaScript click event. For example, this can be the case when the element is not visible, or not clickable, thus we cannot use native WebDriver methods. WebDriver driver = new FirefoxDriver(); JavascriptExecutor js = (JavascriptExecutor)driver; js.executeScript("document.querySelector('.pane .object_options .dropdown-toggle span').click();");