Canzhong Wu, Macquaire University, Sydney.
Abstract
This thesis is concerned with modelling linguistic resources, and the central goal is to build a theoretical model and realize this model computationally in the form of a resource development system. The system is based on systemic-functional theory and draws on insights from computational linguistics and corpus linguistics. It is intended to serve as an environment for assisting as an amanuensis in doing systemic research, and is thus called SysAm.
SysAm aims to open up bottlenecks in both linguistics and computational linguistics. Both areas of research have been greatly hampered by the inability to carry out automatic or semi-automatic analysis of large volumes of text, and by lack of integration of automatic and manual analysis in analysis systems. On the one hand, SysAm provides tools for doing automatic low-level text analysis (SysConc) and high-level manual analysis (SysFan) at the instantial end of the cline of instantiation, and on the other hand, it provides tools for developing and managing linguistic systems at the potential end. These tools, together with other components of SysAm, constitute an integrated environment for the development of linguistic resources.
In Chapter 1, I look at the various demands for linguistic resources in both natural language processing (NLP) and other areas, and stress the need for an integrated resource development environment, and a theoretically comprehensive model. In Chapter 2, I introduce systemic-functional theory as the general dimensions for organizing the linguistic resources and differentiating the SysAm tools. In Chapter 3, I describe the development of linguistic resources in SysRef, and illustrate how the resources are partitioned and indexed to meet different consumer demands. In Chapter 4, I focus on the manual analysis of textual instances with SysFan, illustrating how the analysis results can be interpreted and visualized in the system. In Chapter 5, I discuss the major functions of SysConc, a concordance program that is specifically geared to systemic-functional research, but may also be used as a corpus tool for extracting linguistic patterns from the corpus. In Chapter 6, I explore SysAm as an integrated resource development workbench, showing the move along the cline of instantiation from the systemic potential to the textual instance or from the textual instance to the systemic potential.
The research is significant in both the theoretical and practical spheres: on the one hand, it serves as the starting point for future long-term work on large-scale linguistic descriptions based on an open corpus, and on the other hand, it creates an online resource urgently needed for a variety of applications ranging from educational research applications to computational research applications.
Table of Contents