Friday, February 12, 2010

Two Cents Thoughts for Guest Lecture and BI Project Presentation

We have attended a guest lecture talk a few days ago on Monday. There were three guest being invited to give us a talk - Carolyn Khiu, Clinique brand manager, Erwin Lim, CRM specialist of Aspial Jewellery and Mr. Chin, Managing Director of PulseMetrics Pte Ltd. Mr. Chin is also our major project company's supervisor.

The talk was interesting as it involves BI and data mining in a real industry. The guests have gave us how data mining helped them in their business. It is something very different experiences from what we have done for our data mining projects. They also brief us on the challenges they faced while implementing BI system. I have benefited a lot just by listening their talks and understand more about BI (I am sure my friends feel the same way too!).

Then came our BI group project presentation on Wednesday. I got to say that our group faces quite a lot of difficulties and resource constraints for the BI project. First of all, we have no idea how and what to create dashboard and scorecards that will be suitable for management level and operation level. Other than that, only one of the lab has the software that is needed to do the project and not all of the time we are able to access to it.

Our tutors have helped a lot in this project by giving guidance and feedbacks. They also try to encouraage us to go research on our own so that we can learnt and not just being spoon feed by them. Though we had spent many effort and time in this project, but I believe it has being paid off. Least, we have learnt something from the process this project (Such as using Performa software and creating dashboard/scorecards for different levels).

In addition, the bond between my group members and I have also grew stronger. We have learnt to communicate, share information with each other, help one and another so that we could achieve the same goal (To get "A" or "Z", hohoho~).

Anyhow, the overall learning experience was great. I have also enhanced in some of mine data mining knowledge and skills!

Week 15 - Future of Data Warehouse, Data Mining and Data Visualisation

The last of our BI lecture talks about the future of data warehousing, data mining and data visualisation. This week, I have learnt what to expect from the BI world. Understand the future of BI allows us to look forward to the development of BI.

Data Warehouse (DW)

Future of Data Warehouse:
  • Coming up with a workable set of rules that ensure privacy of data as well as facilitating the use of large data set (Also a challenge)
  • Store unstructured data such as multimedia, maps and sounds
Future of data warehousing will not be a high performance disk storage but rather an array of alternative storage.

There are four reasons of using an alternative storage:
  1. Data in a DW are stable. They are placed there once and left alone, hence it does not need to be updated at high speed
  2. Queries operating a DW often require long streams of data stored sequentially
  3. DW is of indeterminate size and is always increasing in volume, requiring flexible capacity
  4. When data gets accessed less often as it ages, it can be moved to secondary storage, making access to newer data more efficient
Now the trend of data warehousing is about getting data readily available, accessible, updated and fast implementation. It can be expected in future that more organisation will build Web applications that operate in conjunction with the DW.

Patrick (2005) mentioned that organisation will be trying to integrate DW with their ERP system so that their DW is updated real-time. This is also one of the DW trend going in the corporate world.

Data Mining

Data mining is about discovery insights or underlying business values from the sets of data. It has helped marketers in successfully launching of campaigns, cut costs for the company and so on. However, before mining, data has to be integrated, transformed and cleanse. Besides that, there is also concern of privacy.

Internet has grown rapidly throughout the years, so as the problems of network intrusion. Those network intrusion attacks on the vulnerability of the system and will be able to get hold of the data.

Other than using data mining in business, it can also be used to identify patterns of valid network activity.

The trends of data mining from the lecture slides are:
  • Next generation Internet will connect sites 100 times faster than current speeds
  • Business will react more quickly and offer better service, do it with fewer people and at a lower cost
However, Information Management Magazine (2004) said that the future of data mining will lies on the predictive analysis. This predictive analysis will be realistic about the required complex mixture of business acumen, statistical processing and information technology support as well as the fragility of the resulting predictive model; but make no assumptions about the limits of predictive analytic.

In addition, it mentioned something very interesting saying that data mining technology has not lived up to its promise because "data mining" is a vague and ambiguous term. It overlaps with data profiling, data warehousing and even such approaches to data analysis as online analytic processing (OLAP) and enterprise analytic applications. I agreed to a certain extend during the process of doing my major project. It surely seems that data mining is overlapping other analysis. Anyhow, data mining is a sub-set of BI, that may be why there is overlapping.

Visualisation

Data visualisation links the critical components and enables the smooth flow of information among the components. It is able to conveys technical or complicated things to the management or even to the whole organisation people. It serves as an important way of communicating.

Future Trends
Bounds between computers, graphics and human knowledge will become blurred. Many advances in technology will be needed to handle the visualisation environment of the future. Intelligent file systems and data management software will contend with thousands of coupled storage devices.

Visualisation has many types of forms. Dashboards, google map, evolved arts and etc. This below shows another way of visualising news in a creative way.

One of an innovative way of displaying BBC News Browser (Taken from lecture slide)

I have found one article (in pdf) written by Stephen Few (2007) who is also the author of "Information Dashboard Design: The Effective Visual Communication of Data" (One of the books used as references by our BIT tutors). He wrote the good and bad trends of data visualisation and also the future direction of visualisation. It is an interesting article which I think everyone should go read it.

Week 14 - Implementing Enterprise Business Intelligence System

A company can either outsource to a company to do BI or setting up one within the company. And that is Business Intelligence Competency Center (BICC).

According to SAS, BICC is a cross-functional team with a permanent, formal organizational structure. It is owned and staffed by the client and has defined tasks, roles, responsibilities and processes for supporting and promoting the effective use of business intelligence and performance management across the organization.

There are some challenges in implementing BI system according to this week lecture slides.
  • Inconsistent BI deployments
  • Difficulty in managing, implmenting and supporting BI initiatives that span multiple departments
  • Lack of standardisation of methodologies, definitions, processes, tools and technologies
  • Insufficient BI skills
  • Need for comprehensive, strategic approach to BI that addresses technology as well as people, processes and organisational culture
Once these challenges are being overcome, it will bring benefits to the organisation.

SAS stated that a recent survey performed by BetterManagement.com titled "How do you plan for Business Intelligence?", organizations with a BI Competency Center see the following benefits:
  • Increased usage of business intelligence (74 percent)
  • Increased business user satisfaction (48 percent)
  • Better understanding of the value of BI (45 percent)
  • Increased decision-making speed (45 percent)
  • Decreased staff costs (26 percent)
  • Decreased software costs (24 percent)
This actually shows that how BICC benefited an organisation.

This week lecture slide also talks about some of the functions of BICC. SAS also mentioned the functions of BICC.

The more common ones will be:
  • BI Program Management Office (BI PMO)
  • Data Stewardship
  • Vendor Management
  • Information Management
  • Information Delivery
Functions of BICC (Taken from SAS web site)



There are four steps in establishing a BICC:
  1. Initialisation
  2. Definition of a BICC plan
  3. Establishment of BICC
  4. BICC in operation
These are the factors which organisation will need to take notice of as it normally contributes to the failure of BICC:
  • Costs
  • Organisational structures/political
  • Lack of management support
  • Technological failures
  • Over reliant on BI system and ad-hoc reporting
  • Disrupt well-establish reporting cycles
  • Reacting too swifty resulting in destability inorganisation
Besides this, BICC normally are not being recognised in an organisation. This is because it does not show results/benefits in dollars and cents. Instead, it only provides information. Other than that, departments may tend not to rely on BICC as it add burdens to their budget and they may not trust what BICC can do for them. Hence, management support is very important so as for them to initiate department to fully utilise BICC.

I would like to recommend this SAS link (as posted earlier on in this post) as it talks in details about BICC functions.

Week 13 - Text Mining and Web Mining

We were introduced new mining techniques this week - text mining and web mining.

Both these techniques are very different from what we have learnt before. For a set of data, it sometimes will contains attribute like comments. Normally, this "comment" attribute will be filtered away as it data mining tool like PASW Modeler 13 will not be able to mine it.

Text mining basically means to uncover information hidden in text. It attempts to categorise textual data.
3 steps Text mining algorithms involves

According to Wikipedia, 'high quality' in text mining usually refers to some combination of relevance, novelty, and interesting-ness.

There are some challenges to text mining.
  • Handling ambiguities such as spelling and grammar mistakes
  • Text contains acronyms, abbreviations, misspellings (E.g. customer, cus, customar, csmr)
  • Semantic analysis (E.g. book = to reserve something VS book = a manual)
  • Syntax analysis
Still, if all the challenges above are solved, patterns and trends will be presented in graphs and could be used to help the organisation greatly such as:
  • Automatic detection of e-mail span or phishing
  • Automatic processing of messages or e-mails
  • Analysis of warranty claims, help desk calls/ reports, etc to identify the most common problems and relevant responses
  • Analysis of related scientific publications in journals
  • Filter and match resumes
Next, web mining. It basically does the same thing as text mining except that it also analysis log files in the web sites.

There are three types of web mining:
  1. Web content mining
  2. Web structure mining
  3. Web usage mining
Typical Web Server Log File

It will capture information such as:
  • User's IP address
  • Date and Time
  • Request
  • Statues
  • Bytes
  • Previous Website
  • Website user request to go
  • Internet Browser used
Session File

This session file is extracted from the web server log files. It shows the number of clicks user will need to click in order to click to the page they want. This is somehow similar to "purchase sequence analysis" . If page 11 is the most popular page that users will normally go, then maybe the company will want to customise their web page so that users will not need to click many times in order to get to page 11. This is also making their web site more user friendly and accessible to the page users want.

Web mining also analysis users' behaviours. For example, web mining will observes the buying patterns of the user and then make recommendations to the users. This involves the marking cross-selling techniques.

Example of cross-selling

This is an example of personalise of a web site.

Personalisation of web site

Text mining and web mining will require a lot of work especially in the preparation of data such as creating a user dictionary. However, more than 80% of organisational information is in unstructured textual form which is an untapped gold mine of textual information.

(All images are taken from Temasek Polytechnic, BIT lecture slide)*

Thursday, February 11, 2010

Week 12 - Advanced Data Mining Techniques

This week lecture talked about the advanced data mining techniques - Regression and Neural Network.

There are 3 types of regression models.
  • Linear regression
  • Nonlinear regression
  • Logistic regression
Regression model are normally being used for:
  1. Fit data
  2. Time-series data: Forecast
  3. Other data: Predict
I do not really understand what is the meaning of fit data, hence I did some googling. And it seems that fit data means putting a straight line into a non-linear graph.

Typically, a best straight line will be drawn.

According to Wikipedia, it says that regression analysis helps us understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.

The following week lab lesson teaches us about how we could use Microsoft Excel to do an regression analysis by using LINEST function.

Besides using regression model for prediction, Neural Network provides both supervised and unsupervised modeling. It is quite similar to regression model except that the algorithms are different.

From the lecture slide, Neural Network is a computer technology that attempts to build computers that will operate like a human brain. The machines possess simultaneous memory storage and works with ambiguous information.

Neural Network was frequently used for approval of loan application and fraud prevention. It also includes time-series forecasting like regression.

There are two types of Neural Network:
  1. Feed-Forward Neural Network - Supervised Learning
  2. Kohonen Neural Network - Unsupervised Learning
So far, we only have done before feed-forward neural network in our data mining project. However, the interpretation of Neural Network is still quite difficult. And this is one of the major cons as it lacks of explanation.

Kohonen Neural Network is not being used before because all attributes got to be numeric, hence categorical attributes will need to be converted into numeric first. Again, it also lacks of explanation capability.

Regression and Neural Network are both interesting data mining techniques. But it also requires some skills for miners to convert the data to the required format for the data mining tools to mine.