Information Mining Chris Nelson CS 157 A Fall 2007
Slide 2Data Mining New popular expression, old thought. Inducing new data from effectively gathered information. Generally occupation of Data Analysts Computers have changed this. Significantly more proficient to go over information utilizing a machine than eyeballing factual information.
Slide 3Data Mining – Two Main Components Wikipedia definition: "Information mining is the whole procedure of applying PC based strategy, including new strategies for learning revelation, from information." Knowledge Discovery Concrete data gathered from known information. Information you might not have known, but rather which is bolstered by recorded certainties. (ie: Diapers and lager case from past presentation) Knowledge Prediction Uses known information to figure future patterns, occasions, and so on (ie: Stock market expectations) Wikipedia note: "some information mining frameworks, for example, neural systems are innately adapted towards forecast and example acknowledgment, as opposed to learning revelation." These incorporate applications in AI and Symbol examination
Slide 4Data Mining versus Information Analysis as far as programming and the advertising thereof Data Mining != Data Analysis Data Mining suggests programming utilizes some insight over straightforward gathering and apportioning of information to construe new data. Information Analysis is more in accordance with standard measurable programming (ie: web details). These normally exhibit data about subsets and relations inside the recorded information set (ie: program/web crawler utilization, normal visit time, and so forth )
Slide 5Data Mining Subtypes Data Dredging The way toward filtering an information set for relations and afterward thinking of a theory for presence of those relations. MetaData Data that depicts other information. Can portray an individual component, or a gathering of components. Wikipedia case: "In a library , where the information is the substance of the titles loaded, metadata about a title would commonly incorporate a portrayal of the substance, the creator , the distribution date and the physical area" Applications for Data Dredging in business incorporate Market and Risk Analysis, and exchanging methodologies. Applications for Science incorporate catastrophe expectation.
Slide 6Propositional versus Social Data Old information mining strategies depended on Propositional Data, or information that was identified with a solitary, focal component, that could be spoken to in a vector arrange. (ie: the obtaining history of a solitary client. Amazon uses such vectors in its related thing recommendations [a multidimensional speck product]) Current, propelled information mining techniques depend on Relational Data, or information that can be put away and demonstrated effectively through utilization of social databases. A case of this would be information used to speak to interpersonal relations. Social Data is more intriguing than Propositional information to mineworkers as in a substance, and every one of the elements to which it is connected, figure the information deduction prepare.
Slide 7Key Component of Data Mining Whether Knowledge Discovery or Knowledge Prediction, information mining takes data that was once very hard to distinguish and shows it in an effortlessly reasonable configuration (ie: graphical or measurable) Data mining Techniques include advanced calculations, including Decision Tree Classifications, Association identification, and Clustering. Since Data mining is not on test, I will keep things shallow.
Slide 8Uses of Data Mining AI/Machine Learning Combinatorial/Game Data Mining Good to analyze winning systems to recreations, and therefore creating keen AI adversaries. (ie: Chess) Business Strategies Market Basket Analysis Identify client demographics, inclinations, and obtaining designs. Hazard Analysis Product Defect Analysis Analyze item deformity rates for given plants and anticipate conceivable difficulties (read: claims) down the line.
Slide 9Uses of Data Mining (Continued) User Behavior Validation Fraud Detection In the domain of mobile phones Comparing telephone movement to calling records. Can distinguish calls made on cloned telephones. Thus, with Mastercards, contrasting buys and authentic buys. Can identify movement with stolen cards.
Slide 10Uses of Data Mining (Continued) Health and Science Protein Folding Predicting protein cooperations and usefulness inside organic cells. Uses of this examination incorporate deciding causes and conceivable cures for Alzheimers, Parkinson's, and a few malignancies (brought about by protein "misfolds") Extra-Terrestrial Intelligence Scanning Satellite gatherings for conceivable transmissions from different planets. For more data see Stanford's Folding@home and SETI@home ventures. Both include interest in a broadly conveyed PC application.
Slide 11Sources of Data for Mining Databases (most self-evident) Text Documents Computer Simulations Social Networks
Slide 12Privacy Concerns Mining of open and government databases is done, however individuals have, and keep on raising concerns. Wiki cite: "data mining gives data that would not be accessible something else. It must be legitimately deciphered to be valuable. At the point when the information gathered includes distinctive individuals, there are numerous inquiries concerning security, legitimateness, and ethics."
Slide 13Prevalence of Data Mining Your information is as of now being mined, in any case. Numerous web administrations require that you permit access to your data [for information mining] with a specific end goal to utilize the administration. Google mines email information in Gmail records to present record proprietors with promotions. Facebook obliges clients to permit access to data from non-Facebook pages. Facebook security approach: "We may utilize data about you that we gather from different sources, including however not restricted to daily papers and Internet sources, for example, online journals, texting administrations and different clients of Facebook, to supplement your profile. This permits access to your blog RSS channel (somewhat harmless), and also data got through accomplice locales (deserving of concern).
Slide 14Data Mining Controversies Latest one: Facebook's Beacon Advertising program (Just popped on Slashdot inside the most recent week) What Beacon does: "when you take part in purchaser action at a [Facebook] accomplice site, for example, Amazon, eBay, or the New York Times, will Facebook record that action, as well as your Facebook associations will likewise be educated of your buys or activities." [taken from http://trickytrickywhiteboy.blogspot.com/2007/11/be careful with facebooks-beacon.html]
Slide 15Controversies proceeded with Implications: "Thus where Facebook used to gather information just inside the limits of its own site, it will now extend that capacity to collect information crosswise over different sites that it accomplices with. A portion of the organizations that have marked on to take an interest on the publicizing side incorporate Coca-Cola, Sony, Verizon, Comcast, Ebay — and the CBC. The underlying rundown of 44 accomplice sites taking part on the information accumulation side incorporate the New York Times, Blockbuster, Amazon, eBay, LiveJournal, and Epicurious." [Remember the protection approach on the past slide] Verdict is still out. This may abuse an old (100+ years) New York law precluding publicizing utilizing supports without the endorsee's assent. Facebook presently offers clients no real way to quit Beacon (once it has been actuated ?). Clients can close the records, yet account information is never erased.
Slide 16Bottom Line Data got through Data Mining is fantastically important Companies are naturally hesitant to surrender information they have acquired. Hope to see commonness of Data Mining and (conceivably subversive) strategies increment in years to come.
Slide 17Recommended Resources and Works Consulted Wikipedia Data Mining passage http://en.wikipedia.org/wiki/Data_mining "Privacy is Dead - Get Over It: Revisited" Steve Rambam's Hope Number Six address http://www.hopenumbersix.net/speakers.html#pid2 Facebook's Faux Pas http://www.newsweek.com/id/69275 Beware of Facebook's Beacon http://trickytrickywhiteboy.blogspot.com/2007/11/be careful with facebooks-beacon.html Facebook Data Mining guide http://saunderslog.com/2007/11/25/facebook-statistical surveying mysteries/Data Mining in Social Networks http://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf
SPONSORS
SPONSORS
SPONSORS