skip to main content
Tipo de recurso Mostra resultados com: Mostra resultados com: Índice

Combining sentence similarities measures to identify paraphrases

Ferreira, Rafael ; Cavalcanti, George D.C ; Freitas, Fred ; Lins, Rafael Dueire ; Simske, Steven J ; Riss, Marcelo

Computer Speech & Language, January 2018, Vol.47, pp.59-73 [Periódico revisado por pares]

Texto completo disponível

Citações Citado por
  • Título:
    Combining sentence similarities measures to identify paraphrases
  • Autor: Ferreira, Rafael ; Cavalcanti, George D.C ; Freitas, Fred ; Lins, Rafael Dueire ; Simske, Steven J ; Riss, Marcelo
  • Assuntos: Sentence Similarity ; Paraphrase Identification ; Sentence Simplification ; Graph-Based Model ; Engineering ; Computer Science
  • É parte de: Computer Speech & Language, January 2018, Vol.47, pp.59-73
  • Descrição: •It proposes a new paraphrase identification system based on lexical, syntactic, semantic analysis.•It uses different machine learning algorithms to classify the paraphrase.•The measure was evaluated using state-of-art dataset: Microsoft Paraphrase Corpus. Paraphrase identification consists in the process of verifying if two sentences are semantically equivalent or not. It is applied in many natural language tasks, such as text summarization, information retrieval, text categorization, and machine translation. In general, methods for assessing paraphrase identification perform three steps. First, they represent sentences as vectors using bag of words or syntactic information of the words present the sentence. Next, this representation is used to measure different similarities between two sentences. In the third step, these similarities are given as input to a machine learning algorithm that classifies these two sentences as paraphrase or not. However, two important problems in the area of paraphrase identification are not handled: (i) the meaning problem: two sentences sharing the same meaning, composed of different words; and (ii) the word order problem: the order of the words in the sentences may change the meaning of the text. This paper proposes a paraphrase identification system that represents each pair of sentence as a combination of different similarity measures. These measures extract lexical, syntactic and semantic components of the sentences encompassed in a graph. The proposed method was benchmarked using the Microsoft Paraphrase Corpus, which is the publicly available standard dataset for the task. Different machine learning algorithms were applied to classify a sentence pair as paraphrase or not. The results show that the proposed method outperforms state-of-the-art systems.
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.