Saturday, February 11, 2006

suffix array: a useful data structure in string manipulation

Reading digest < Programming Pearls >
-----------------------------------
I found that the suffix array is an interesting data structure in manipulating strings.

The following is a sample code snippet to construct a suffix array.

char c[MAXLEN], *a[MAXLEN];

int n = 0;

while ((ch = getchar()) != EOF) {
a = &c[n];
c[n++] = ch;
}

c[n] = 0;


The suffix array could be used to find the repeated strings and quick substring searching.

Another common used suffix array is to point to each word, instead of each letter in the above example.

char inputchars[MAXLEN];
char *word[MAXLEN/5]; /* assume the average length of words is 5 */

words[0] = inputchars;
int nword = 0;

while (scanf("%s", word[nword]) != EOF) {
word[nword+1] = word[nword] + strlen(word[nword]) + 1;
nword++;
}


Each word is appended to inputchars (no extra storage is needed).

No comments: