WordCloud Architect is a versatile, high-performance text analysis engine designed to transform local Excel databases into high-impact visual insights. While it works seamlessly with Western languages (English, Portuguese, etc.), it is uniquely equipped with advanced Japanese Natural Language Processing (NLP) capabilities.
The core challenge of Japanese text—the lack of spaces between words—is handled by an integrated Morphological Analysis module using the Janome library. This allows the engine to segment continuous text into meaningful units, ensuring professional-grade word clouds and statistical reports regardless of the language.
- Universal WordCloud Generation: Effortlessly process any
.xlsxfile by pointing to the desired text and keyword columns. - Advanced Japanese Module: Uses
Janomefor deep morphological analysis, accurately extracting Nouns (名詞) and Adjectives (形容詞) from non-spaced text. - Local-First & Secure: Operates entirely on your local machine. No internet, cloud APIs, or Google Sheets credentials required.
- Smart Stopword Filtering: Integrated support for an external
stopwords.txtfile (UTF-8) to eliminate noise across multiple languages simultaneously. - Automated Data Reporting: Automatically exports a comprehensive frequency report (
.xlsx) containing the top 300 terms for each analyzed keyword. - Professional CJK Rendering: High-fidelity font management ensures Japanese characters are displayed perfectly (no "tofu" blocks).
- Safe Filename Sanitization: Automated regex cleaning to ensure generated images are saved correctly, even if keywords contain illegal OS characters.
graph TD
Data[Local Excel File] -->|Pandas| Loader[Data Loader]
Loader -->|Text Stream| NLP[Multilingual / Japanese Engine]
NLP -->|POS Tagging| Filter[Multi-Layer Filter]
Filter -->|Stopwords Filter| Counter[Frequency Counter]
Counter -->|Frequency Dict| Cloud[WordCloud Generator]
Cloud -->|Matplotlib| Render[Visual Export .png]
Counter -->|Counter List| Report[Excel Report .xlsx]
wordcloud_from_excel/
├── input/ # Source Data (Excel & Stopwords)
├── output/ # Generated PNGs and Reports
├── venv/ # Python Virtual Environment
├── generate_wordcloud.py # Core Engine & Config
├── requirements.txt # Project Dependencies
└── README.md # Documentation
- Python 3.9+
- CJK Font: For Japanese support, ensure you have a compatible font (e.g.,
msgothic.ttcon Windows orNotoSanson Linux).
-
Clone the Repository
-
Initialize Virtual Environment
python -m venv venv # Activate (Windows) .\venv\Scripts\activate # Activate (Unix) source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
-
Execute Analysis
python generate_wordcloud.py
In Japanese, text is a continuous stream. The WordCloud Architect acts as a "Linguistic Chef", slicing the text stream based on morphological rules. It prioritizes:
-
名詞 (Nouns): To capture the core subjects.
-
形容詞 (Adjectives): To capture sentiments and qualities.
The engine applies four distinct filters:
-
Grammatical Filter: Removes particles (助詞) and auxiliary verbs (助動詞).
-
Internal Dictionary Filter: Standard Japanese stopword sets.
-
External User Filter: Processes
stopwords.txtto remove custom noise (e.g., "PDF", "Click", "Views"). -
Structural Filter: Removes single-character tokens and numeric strings.
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
Rubens Braz


