|
|

This is only a preview of the paper Click here to register and get the full text. Existing members click here to login
|
|
|
INTRODUCTION
Data Mining and Privacy
Privacy. What is Privacy? Privacy is the “quality or state of being apart from company or observation” or “freedom from unauthorized intrusion” (Merriam-Webster 2002). Privacy concerns have taken a more significant role in American society the ever before as big corporations have begun to build huge warehouses to store personal data. The major concern is not that the corporations are building the warehouses, it’s what their intended us of the data is. That’s what concerns the American people and whether or not the data collection violates the 4th amendment the fourth amendment states:
The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or Affirmation, and Particularly describing the place to be searched, and the persons or things to be seized (United States Senate 2002).
A report released by Ontario Information and Privacy Commissioner Ann Cavoukian, “Data Mining: Staking a Claim on Your Privacy” states that data mining “may be the most fundamental challenge that privacy advocates will face in the next decade.” The Internet, however, creates many threats to our personal privacy. ...
In this thesis we will discuss what data mining is, what data mining is used for and whose data mining.
LITERARY REVIEW
What is Data Mining?
Data mining is the process of finding hidden or unknown pieces of information from large databases. Data mining makes it possible to put together similarities and relationships, and then use the newly created information to make sound knowledgeable business decisions. “Data mining then, “centers on the automated discovery of new facts and relationships in data. The raw material is the business data, and the data mining algorithm is the excavator, sifting through the vast quantities of raw data looking for the valuable nuggets of business information” (Thearling 1998).
The Data Mining Glossary from the Two Crows, states that “Data Mining is an information extraction activity whose goal is to discover hidden facts contained in databases. Using a
combination of machine learning, statistical analysis, modeling techniques and database
technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results (Two Crows 2002). ... In the evolution from business data to business information, each new step has built upon the previous one. According to Kurt Thearling (1996), these are the steps in the evolution of data mining:
Evolutionary Step Business Question Enabling Technologies Product Providers Characteristics
Data Collection
(1960s) "What was my total revenue in the last five years?" Computers, tapes, disks IBM, CDC Retrospective, static data delivery
Data Access
(1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language (SQL), ODBC Oracle, Sybase, Informix, IBM, Microsoft Retrospective, dynamic data delivery at record level
Data Warehousing &
Decision Support
(1990s) "What were unit sales in New England last March? ... " On-line analytic processing (OLAP), multidimensional databases, data warehouses Pilot, Comshare, Arbor, Cognos, Microstrategy Retrospective, dynamic data delivery at multiple levels
Data Mining
(Emerging Today) "What’s likely to happen to Boston unit sales next month? ... Steps in the Evolution of Data Mining
The table indicates that Data mining was emerging as early as 1996, it is still however, emerging today as business are using data mining even more. As the need to gain the competitive advantage continues to fuel big business, data mining will continue to evolve and it’s uses will become limitless.
What data mining can do?
There are six task Data mining can do. The first three are all examples of directed data mining, where the goal is to use the available data to build a model that describes one particular variable of interest in terms of the rest of the available data. ... In directed data mining, we try to find patterns that will make that variable have that value: 0 or 1. The next three tasks are examples of undirected data mining where no variable is singled out as a target and the goal is to establish some relationship among all the variables. In the previous bankruptcy example, data mining tries to identify patters of the behavior of customers without indicating that those customers are in bankruptcy or not. ... Data mining can be used to obtain all sorts of information to include but not limited to the following:
1. to discover information within data warehouses that queries and reports cannot effectively reveal
2. Find patterns in data and infer rules
3. Use patterns & rules to guide decision-making & forecasting
There are five common types of information that can be obtained by data mining: 1) association, 2) Description and Visualization, 3) classifications, 4) clusters, and 5) forecasting or prediction.
Approximate Word count = 3906 Approximate Pages = 15.6 (250 words per page double spaced)
|
|
|

|
|
|