Rabin-Karp pattern search algorithm
...y residues of modulo p, from 0,1,2?(p-1). If this fingerprint function is applied to the above numbers, we get Np(X) and Np(Y(i)).
We can reduce N(X) into Np(Y(i)) easily using the properties of modulo arithmetic, N(X) can also be written as,
N(X)= (((x1*2+x2 )*2+..)?xm-1)*2+xm
and Np(X)= (((x1*p2+px2 )*p2+p..)?xm-1)*p2+pxm
where x1*p2 = (x1 * 2) mod p
so Np(X) can be calculated easily by evaluating from the inner most parenthesis one by one. Np(Y(i)) can be calculated similar way but we need to compare this against the patter for i between 1 and n-m+1. Np(Y(i)), for every i can be calculated easily using the update formula, i.e.
Np(Y(i+1))=(Np(Y(i)) –p 2(m-1)*Y(i)) *p 2 + Y(i+m)
Using the update formula we can calculate the hash value for the next sub text from the hash value of the preceding sub text. Now Np(X) is compared against Np(Y(i)) and if they are equal for a i then the match is found in the search text. When using one prime it may happen that Np(X)=Np(Y(i)) but still N(X) is not equal to N(Y(i)), so we need to check the strings to make sure that strings are indeed correct. But using several primes and several fingerprint functions reduces the probability of an erroneous match.
If k primes are chosen at random between 1 and M then for a match,
Np(j)(X) = Np(j)(Y(i)) for 1 <= j <= k
k fingerprint functions are used corresponding to k primes. The probability of an erroneous match in this case is,
Perror = t(?m)/ ?n))^k where m>= 29, ?m) is number of primes less than or equal to m and t=n-m+1
III. Implementation of the algorithm in C
#include