Business Intelligence Reflection: Text Mining

We were introduced new mining techniques this week - text mining and web mining.

Both these techniques are very different from what we have learnt before. For a set of data, it sometimes will contains attribute like comments. Normally, this "comment" attribute will be filtered away as it data mining tool like PASW Modeler 13 will not be able to mine it.

Text mining basically means to uncover information hidden in text. It attempts to categorise textual data.

3 steps Text mining algorithms involves

According to Wikipedia, 'high quality' in text mining usually refers to some combination of relevance, novelty, and interesting-ness.

There are some challenges to text mining.

Handling ambiguities such as spelling and grammar mistakes
Text contains acronyms, abbreviations, misspellings (E.g. customer, cus, customar, csmr)
Semantic analysis (E.g. book = to reserve something VS book = a manual)
Syntax analysis

Still, if all the challenges above are solved, patterns and trends will be presented in graphs and could be used to help the organisation greatly such as:

Automatic detection of e-mail span or phishing
Automatic processing of messages or e-mails
Analysis of warranty claims, help desk calls/ reports, etc to identify the most common problems and relevant responses
Analysis of related scientific publications in journals
Filter and match resumes

Next, web mining. It basically does the same thing as text mining except that it also analysis log files in the web sites.

There are three types of web mining:

Web content mining
Web structure mining
Web usage mining

Typical Web Server Log File

It will capture information such as:

User's IP address
Date and Time
Request
Statues
Bytes
Previous Website
Website user request to go
Internet Browser used

Session File

This session file is extracted from the web server log files. It shows the number of clicks user will need to click in order to click to the page they want. This is somehow similar to "purchase sequence analysis" . If page 11 is the most popular page that users will normally go, then maybe the company will want to customise their web page so that users will not need to click many times in order to get to page 11. This is also making their web site more user friendly and accessible to the page users want.

Web mining also analysis users' behaviours. For example, web mining will observes the buying patterns of the user and then make recommendations to the users. This involves the marking cross-selling techniques.

Example of cross-selling

This is an example of personalise of a web site.

Personalisation of web site

Text mining and web mining will require a lot of work especially in the preparation of data such as creating a user dictionary. However, more than 80% of organisational information is in unstructured textual form which is an untapped gold mine of textual information.

(All images are taken from Temasek Polytechnic, BIT lecture slide)*

Business Intelligence Reflection

Friday, February 12, 2010

Week 13 - Text Mining and Web Mining

About Me

Blog Archive