Jaume Nualart Vilaplana, Researcher

Gabriela Ferraro,

Hanna Suominen and

Jaume Nualart

(MLRG-NICTA, Australia)

Status: published
Type: Conference Paper
Conference/location: EACL
Conference URL: http://mcs.open.ac.uk/nlg/pitr2014/

This is a project to improve readability with Natural Language Processing (NLP) and graphic design. This project was done at Machine Learning Research Group (NICTA) in collaboration with Dr Gabriela Ferraro, and Dr Hanna Suominen.

Layout design can improve text analysis outputs

ABSTRACT

Good readability of text is important to ensure efficiency in communication and eliminate risks of misunderstanding. Patent claims are an example of text whose readability is often poor. In this paper, we aim to improve claim readability by a clearer presentation of its content. Our approach consist in segmenting the original claim content at two levels. First, an entire claim is segmented to the components of preamble, transitional phrase and body, using a rule-based approach. Second, a conditional random field is trained to segment the components into clauses. An alternative approach would have been to modify the claim content which is, however, prone to also changing the meaning of this legal text. For both segmentation levels, we report results from statistical evaluation of segmentation performance. In addition, a qualitative error analysis was performed to understand the problems underlying the clause segmentation task. Our accuracy in detecting the beginning and end of preamble text is 1.00 and 0.97, respectively. For the transitional phase, these numbers are 0.94 and 1.00 and for the body text, 1.00 and 1.00. Our precision and recall in the clause segmentation are 0.77 and 0.76, respectively. The results give evidence for the feasibility of automated claim and clause segmentation, which may help not only inventors, researchers, and other laypeople to understand patents but also patent experts to avoid future legal cost due to litigations.