Topic LLM Cleaner¶
A class for cleaning topic labels using a generative model.
This class utilizes a language model to generate cleaned and more coherent labels for a given list of topics. The cleaning process considers the top documents and terms associated with each topic and optionally includes the actual content of the top documents for a more context-rich label generation.
Source code in bunkatopics/topic_modeling/llm_topic_representation.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
__init__(llm, language='english', top_doc=3, top_terms=10, use_doc=False, context='everything')
¶
Initialize the LLMCleaningTopic instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
llm |
LLM
|
The generative model to use for label cleaning. |
required |
language |
str
|
Language used for generating labels. Defaults to "english". |
'english'
|
top_doc |
int
|
Number of top documents to consider for each topic. Defaults to 3. |
3
|
top_terms |
int
|
Number of top terms to consider for each topic. Defaults to 10. |
10
|
use_doc |
bool
|
Whether to include document contents in label generation. Defaults to False. |
False
|
context |
str
|
Context for label generation. Defaults to "everything". |
'everything'
|
Source code in bunkatopics/topic_modeling/llm_topic_representation.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
fit_transform(topics, docs)
¶
Clean topic labels for a list of topics using the generative model.
This method processes each topic by generating a new, cleaned label based on the top terms and documents associated with the topic. The cleaned labels are then assigned back to the topics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
topics |
List[Topic]
|
List of topics to clean. |
required |
docs |
List[Document]
|
List of documents related to the topics. |
required |
Source code in bunkatopics/topic_modeling/llm_topic_representation.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|