Language: English

Time: Every third year, fourth period (spring)

Level: Advanced

Understanding of the fundamental methods of mining information hidden across very large collections of text. Practical skills in working with machine learning methods and application of basic language technology tools to textual data. Understanding of the possibilities and limitations of state-of-the-art language technology methods. Topics: Web crawls and other large collections of textual data. Preprocessing of large text corpora: segmentation, tagging, and syntactic analysis. Pattern matching, supervised machine learning, and clustering. Information extraction and aggregation. Applications e.g. in scientific literature mining.