'Is tagging a form of data mining?
I am implementing a small CRM system. and the concept of data mining to predict and find opportunities and trends are essential for such systems. One data mining approach is clustering. This is a very small CRM project and using java to provide the interface for information retrieval from database.
My question is that when I insert a customer into database, I have a text field which allows customers to be tagged on their way into the database i.e. registration point.
Would you regard tagging technique as clustering? If so, is this a data mining technique?
I am sure there is complex API such as Java Data Mining API which allows data mining. But for the sake of my project I just wanted to know if tagging users with keyword like stackoverflow allows tagging of keywords on posting question is a form of data mining since through those tagged words, one can find trends and patterns easily through searching.
Solution 1:[1]
To make it short, yes, tags are additional information that will make data mining easier to conduct later on.
They probably won't be enough though. Tags are linked to entities and, depending on how you compute them, they might not show interesting relations between different entities. With your tagging system, the only relation usable I see is 'has same tag' and it might not be enough.
Clustering your data can be done using community detection techniques on graphs built using your data and relations between entities. This example is in Python and uses the networkx library but it might give you an idea of what I'm talking about: http://perso.crans.org/aynaud/communities/
Solution 2:[2]
Yes, tagging is definitely one way of grouping your users. However, it’s different than ‘clustering.’ Here’s why: you’re making a conscious decision on how you want to group them, but there may be better/ different user groups based on ranging behaviors that may not be obvious to you.
Clustering methods are unsupervised learning methods that can help you uncover these patterns. These methods are “unsupervised” because you don’t have a specific target variable; rather, you want to find groups/ patterns that are most prominent in the data. You can feed CRM data to clustering algorithms to uncover ‘hidden’ relationships.
Also, if you’re using ‘tagging,’ it’s more of a descriptive analytics problem - you’ve well-defined groups in the data, and you’re identifying their behavior. Clustering would be a predictive analytics problem - algorithms will try to predict groups based on the user behavior they recognize in the data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Pierre Mourlanne |
Solution 2 | Sonia Kumat |