## Tuesday, 12 September 2017

### Algorithms | The Rabin-Karp algorithm

The fundamental string searching (matching) problem is defined as follows: given two strings – a text and a pattern, determine whether the pattern appears in the text. The problem is also known as "the needle in a haystack problem."

The “Naive” Method
Its idea is straightforward — for every position in the text, considers it a starting position of the pattern and sees if you get a match.

Pseudo code for Naïve(Brute force)
function NaiveSearch(string s[1..n], string pattern[1..m])
for i from 1 to n-m+1
for j from 1 to m
if s[i+j-1] ≠ pattern[j]
return i

The “naive” approach is easy to understand and implement but it can be too slow in some cases. If the length of the text is n and the length of the pattern m, in the worst case it may take as much as (n * m) iterations to complete the task.

It should be noted though, that for most practical purposes, which deal with texts based on human languages, this approach is much faster since the inner loop usually quickly finds a mismatch and breaks. A problem arises when we are faced with different kinds of “texts,” such as the genetic code.

Rabin-Karp Algorithm
This is actually the “naive” approach augmented with a powerful programming technique – the hash function. In order to avoid the comparison between the pattern and the text character by character, we’ll try to compare them at once, so we need a good hash function. With its help, we can hash the pattern and check against hashed sub-strings of the text.

Pseudo code for Rabin-Karp
function RabinKarp(string s[1..n], string pattern[1..m])
hpattern := hash(pattern[1..m]);
for i from 1 to n-m+1
hs := hash(s[i..i+m-1])
if hs = hpattern
if s[i..i+m-1] = pattern[1..m]
return i

Java code for Rabin-Karp
package com.algo;

public class RobinKarp {

private static int WINDOW;

/** PRIME NUMBER which is required the hash function. */
private static final int PRIME_NUM = 3;

/**
* Compare the pattern to text from a given index.
*
* @param text
* @param pattern
* @param i
* @return true/false
*/
private static boolean compare(char[] text, char[] pattern, int i) {
for (int j = 0; j < WINDOW; j++) {
if (text[j + i + 1] != pattern[j]) {
return false;
}
}
return true;
}

/**
* This method is used to calculate the hash of all window in O(n) time complexity.
* Approach:
* 1. Roll out the value of start method
* 2. Divide by prime to reposition the elements, second position element will start
*      element as the prime multiplication has been reduced.
* 3. Include the next element with prime multiplication.
*
* @param str
*/
private static int getRollingHash(char[] array, int s, int hash) {

/** Rolling out first element of window. */
hash = hash - (array[s] - 'a' + 1);

/** Divide by prime number. */
hash = hash / PRIME_NUM;

/** Add new element to window. */
hash = (int) (hash + (array[s + WINDOW] - 'a' + 1) * Math.pow(PRIME_NUM, WINDOW - 1));

return hash;
}

/**
* This method is used to calculate the hash.
*
* @param array
* @param s
* @param l
* @return hash code.
*/
private static int hash(char[] array, int s, int l) {
int hash = 0;

for (int i = s; i < l; i++) {
hash = hash + (int) ((array[i] - 'a' + 1) * Math.pow(PRIME_NUM, i));
}

return hash;
}

private static void robinKarp(String text, String pattern) {

char[] textArray = text.toCharArray();
char[] patternArray = pattern.toCharArray();

/* Window size = length of pattern. */
WINDOW = pattern.length();

/* Calculate the hash of the pattern. */
int pHash = hash(pattern.toCharArray(), 0, WINDOW);

/* Initially calculate the hash of the window size text. */
int tHash = hash(textArray, 0, WINDOW);

for (int i = 0; i < text.length() - WINDOW; i++) {

/**
* Calculate the Rolling hash code
**/
tHash = getRollingHash(textArray, i, tHash);

/**
* First compare the rolling hash of text with hash of pattern if matched,
* compare the Strings else compare next rolling hash.
**/
if (tHash == pHash && compare(textArray, patternArray, i)) {
System.out.println("Pattern occurs with shift " + i);
break;
}
}
}

/**
* Driver method.
*
* @param args
*/
public static void main(String[] args) {
String text = "abedacda";
String pattern = "cda";

/** This method will print the index of pattern in text. */
robinKarp(text, pattern);
}
}

Time complexity
The average and best case running time of the Rabin-Karp algorithm is O(n+m), but its worst-case time is O(nm). Worst case of Rabin-Karp algorithm occurs when all characters of pattern and text are same as the hash values of all the substrings of text [] match with the hash value of pattern[].
Example: pattern[] = “AAA” and text[] = “AAAAAAA”.

Multiple pattern searches
The Rabin–Karp algorithm is inferior for single pattern searching to Knuth–Morris–Pratt algorithm, Boyer–Moore string search algorithm and other faster single pattern string searching algorithms because of its slow worst-case behavior. However, it is an algorithm of choice for multiple pattern searches.

That is, if we want to find any of a large number, say k, fixed length patterns in a text, we can create a simple variant of the Rabin–Karp algorithm that uses a Bloom filter or a set data structure to check whether the hash of a given string belongs to a set of hash values of patterns we are looking for:

function RabinKarpSet(string s[1..n], set of string subs, m):
set hsubs := emptySet
foreach sub in subs
insert hash(sub[1..m]) into hsubs
hs := hash(s[1..m])
for i from 1 to n-m+1
if hs  hsubs and s[i..i+m-1]  subs
return i
hs := hash(s[i+1..i+m])
We assume all the substrings have a fixed length m.

A naïve way to search for k patterns is to repeat a single-pattern search taking O(n+m) time, totaling in O((n+m)k) time. In contrast, the variant algorithm above can find all k patterns in O(n+km) time in expectation, because a hash table checks whether a substring hash equals any of the pattern hashes in O(1) time.

Java code for Rabin-Karp Multiple search
package com.algo;
import java.util.HashMap;
import java.util.Map;

public class RobinKarp {

private static int WINDOW;

/** PRIME NUMBER which is required the hash function. */
private static final int PRIME_NUM = 3;

/**
* Compare the pattern to text from a given index.
*
* @param text
* @param pattern
* @param i
* @return true/false
*/
private static boolean compare(char[] text, char[] pattern, int i) {
for (int j = 0; j < WINDOW; j++) {
if (text[j + i + 1] != pattern[j]) {
return false;
}
}
return true;
}

/**
* This method is used to calculate the hash of all window in O(n) time complexity.
* Approach:
* 1. Roll out the value of start method
* 2. Divide by prime to reposition the elements, second position element will be
*     start element as the prime multiplication has been reduced.
* 3. Include the next element with prime multiplication.
*
* @param str
*/
private static int getRollingHash(char[] array, int s, int hash) {

/** Rolling out first element of window. */
hash = hash - (array[s] - 'a' + 1);

/** Divide by prime number. */
hash = hash / PRIME_NUM;

/** Add new element to window. */
hash = (int) (hash + (array[s + WINDOW] - 'a' + 1) * Math.pow(PRIME_NUM, WINDOW - 1));

return hash;
}

/**
* This method is used to calculate the hash.
*
* @param array
* @param s
* @param l
* @return hash code.
*/
private static int hash(char[] array, int s, int l) {
int hash = 0;

for (int i = s; i < l; i++) {
hash = hash + (int) ((array[i] - 'a' + 1) * Math.pow(PRIME_NUM, i));
}

return hash;
}

private static void robinKarp(String text, String[] pattern) {
char[] textArray = text.toCharArray();
Map<Integer, String> hsubs = new HashMap<Integer, String>();

/**
* Calculate the hash of the pattern all pattern and Store it into the map. .
*/
for (int i = 0; i < pattern.length; i++) {
/* Window size = length of pattern. */
WINDOW = pattern[i].length();

int pHash = hash(pattern[i].toCharArray(), 0, WINDOW);
/* Put the pattern hash into the map. */
hsubs.put(pHash, pattern[i]);
}

/* Initially calculate the hash of the window size text. */
int tHash = hash(textArray, 0, WINDOW);

for (int i = 0; i < text.length() - WINDOW; i++) {

/**
* Calculate the Rolling hash code
**/
tHash = getRollingHash(textArray, i, tHash);

/**
* First compare the rolling hash of text with hash of pattern if matched,
* compare the Strings else compare next rolling hash.
**/
if (hsubs.containsKey(tHash) && compare(textArray, hsubs.get(tHash).toCharArray(), i)) {
System.out.println("Pattern occurs with shift " + i);
break;
}
}
}

/**
* Driver method.
*
* @param args
*/
public static void main(String[] args) {
String text = "abedacda";
String[] pattern = { "cda", "da" };

/** This method will print the index of pattern in text. */
robinKarp(text, pattern);
}
}