Workflow Automation Templates
A library of ready-to-use workflow templates to accelerate your data journey

LDA Topic Modeling
Discover topics from text data

Overview
This workflow demonstrates how Latent Dirichlet Allocation (LDA) is used to automatically discover latent topics and document–topic relationships from raw text data.
Details
The workflow begins by reading raw text documents from a CSV file and tokenizing the text into individual words. Common stop words are then removed to reduce noise and improve topic quality. The cleaned tokens are converted into numerical feature vectors using the Count Vectorizer node. The LDA node is applied to identify hidden topics, assign topic probability distributions to each document, and extract the most important words per topic. Finally, the Print N Rows node displays a sample of the topic distribution results for review and verification.