Idioma:

Efficient text searching

Ricardo Baeza-Yates Gaston H Gonnet

1989

Localização: IME - Inst. Matemática e Estatística IMRE SIMON (CIS QA758.T B142e e.2 )(Acessar)

Enviar para

Título:
Efficient text searching
Autor: Ricardo Baeza-Yates
Gaston H Gonnet
Assuntos: Search theory; Algorithms; Text processing (Computer science); ALGORITMOS E ESTRUTURAS DE DADOS (TESE DOUTORADO); ALGORITMOS PARA PROCESSAMENTO (TESE DOUTORADO)
Notas: Thesis (Ph. D.)--University of Waterloo, 1989 ; Includes bibliographical references and index
Descrição: Abstract: "This thesis presents and analyses new algorithms for text searching. We place special emphasis on the average case of each algorithm, because we are interested in practical applications. We consider plain text and preprocessed text. In preprocessed (indexed) text our main result is to show that automaton searching over a digital tree requires time sublinear in the size of the text for any regular expression; this is the first algorithm with a searching time complexity better than linear. We also show that for a restricted set of regular expressions, searching requires logarithmic time in the expected case. In all these algorithms, the time is independent of the size of the answer.
For searching in plain text, our main results are the following. First, we analyze the expected case of the best known string matching algorithms, obtaining bounds for the Knuth-Morris-Pratt algorithm, and the Boyer-Moore algorithm and several of its variations. We also present improvements to Boyer-Moore type algorithms, in particular for searching English text. Second, we analyze three new, simple, and fast algorithms to solve the problem of string matching with mismatches. These algorithms are eminently practical and can be extended to other variations of this problem. Finally, we present a new approach to pattern matching, based on a numerical representation of the state of the search. We apply this method to patterns including classes of symbols, complement, 'don't care' symbols, multiple patterns, and mismatches. The search time is linear for most practical cases. One of the advantages of these algorithms is that they are suitable for hardware implementation."
Títulos relacionados: Série:University of Waterloo Computer Science Dept Research report CS-89-17
Data de criação/publicação: 1989
Formato: xi, 129 p ill 26 cm.
Idioma: Inglês

Links

Este item no Dedalus

Voltar para lista de resultados

Realização: Logos de Redes Sociais:

Efficient text searching

Ricardo Baeza-Yates Gaston H Gonnet

1989

Buscando em bases de dados remotas. Favor aguardar.