Japanese Language Text Mining: Digital Humanities Methods for Japanese Studies

June 20-22 | 9:00 am - 4:00 pm
June 23 | 9:00 am - 1:00 pm

VENUE: Biological Sciences Learning Center

This closed workshop will bring together researchers working across a variety of disciplines who are interested in learning methods for text analysis on Japanese language materials. The workshop will focus on the unique challenges of digital analyses of Japanese texts while introducing foundational methods and principles of text analysis.  Topics covered include:

  • Finding and using web-based text collections
  • Using web-based analytical tools
  • Creating digital collections with OCR (Optical Character Recognition) software
  • Basic programming fundamentals
  • Metadata preparation and pre-processing tasks (e.g., word segmentation)
  • Principles of text mining (e.g., word counts, collocations, document term matrices, document similarity measures, distinctive word analysis)
  • Overview of advanced techniques (e.g., topic models, word-embedding models)
Workshop Organizers

Hoyt Long (University of Chicago): hoytlong@uchicago.edu
Mark Ravina (University of Texas, Austin): Mark.Ravina@austin.utexas.edu
Paula R. Curtis (University of California, Los Angeles): prcurtis@umich.edu

This workshop is supported with a U.S. Department of Education Title VI National Resource Center grant.