Book Contents

Book Index

Next Topic

Home

Clickstream Example Database

The ClickStream Example Database is a simple star schema that represents a record of the clicks made by a user on a web site. This data can be analyzed and used, for example, for business/marketing purposes or the detection of malicious activities on the website. Each table is described in a separate section.

The Clickstream schema is focused towards discovering interesting and useful information from Web content and usage. This schema can be used for

The data in the ClickStream schema is populated from parsing Web Server logs, users browsing activities and habits etc. This data can be used for tracking malicious and fraudulent activities in real time. The schema is focused towards recognizing patterns either by using statistical models, by manual offline analysis or by SQL queries.

The schema is intended to answer following queries for fraud detection or other purposes

  1. Number of users accessing web server from a given server IP per day? This will help us analyze whether any particular server is clogging the network or is involved in malicious attack.
  2. Which client IP is generating excessively large hits?
  3. Which customer (Client_IP) address is downloading huge amount of Data?
  4. Which customer is coming from more then one client IP?
  5. Which customer is creating large number of sessions per day?
  6. On which page do users stay for maximum duration?

Table Name

Default Number of Rows

ClickStream_Fact

5000000

Customer_Dimension

5000

Session_Dimension

50000

UserAgent_Dimension

500

IPAddress _Dimension

1000

Page_Dimension

5000

CreditCard_Dimension

5000

In This Chapter

ClickStream_Fact

Customer_Dimension

CreditCard_Dimension

Date_Dimension

IPAddress _Dimension

Page_Dimension

Session_Dimension

UserAgent_Dimension

clickstream_query_01.sql

clickstream_query_02.sql

clickstream_query_03.sql

clickstream_query_04.sql

clickstream_query_05.sql