Analyzing Documents with TF-IDF | Programming Historian

https://openmethods.dariah.eu/2019/09/15/analyzing-documents-with-tf-idf-programming-historian/ OpenMethods introduction to: Analyzing Documents with TF-IDF | Programming Historian 2019-09-15 20:37:30 Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency - Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text. Rombert Stapel https://programminghistorian.org/en/lessons/analyzing-documents-with-tfidf Blog post Analysis Capture Content Analysis Discovering English Information Retrieval Relational Analysis Research Activities Research Objects Research Techniques Text Text Bearing Objects Topic Modeling Abraham Lincoln African American Andrew Y. Ng Automatic Text Summarization Barry Warsaw Bartolomeo Vanzetti Cochrane Collaboration Cochrane, Wisconsin csv digital humanities Document Clustering Fivethirtyeight.com Ida M. Tarbell Ida Tarbell information retrieval inverse document frequency investigative journalism Iraq War Logs Journal of Machine Learning Research Jupyter Karen Spärck Jones Latent Dirichlet Allocation lemmatization machine learning Michael I. Jordan n-grams Natural language processing natural logarithm Nellie Bly New York City New York Times Nicola Sacco nom-de-plume non-governmental organization object-oriented programming Path class programming language Python Scikit-Learn sklearn Sparse matrices sparse matrix Standard Oil stop word stopword style guide Term Frequency-Inverse Document Frequency text file Text summarization tf-idf The New York Times tokenize tokenizer topic modeling Upton Sinclair W.E.B. Du Bois Willa Cather

Introduction by OpenMethods Editor (Rombert Stapel):

The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.

This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency – Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.

Source: Analyzing Documents with TF-IDF | Programming Historian