The New "Bill of Rights" of Data Society

1606 days ago, 512 views
PowerPoint PPT Presentation
2. New Bill of Rights. Get the privilege informatione.g. seek enginesTo the privilege peoplee.g. sorting, routingAt the privilege timee.g. Without a moment to spare (assignment displaying, planning)In the right dialect e.g. machine translationWith the right level of subtle element e.g. summarizationIn the privilege mediume.g. access to data in non-literary media.

Presentation Transcript

Slide 1

The New "Bill of Rights" of Information Society Raj Reddy and Jaime Carbonell Carnegie Mellon University March 23, 2006 Talk at Google

Slide 2

New Bill of Rights Get the right data e.g. web search tools To the right individuals e.g. classifying, steering At the ideal time e.g. In the nick of time (errand demonstrating, arranging) In the right dialect e.g. machine interpretation With the right level of detail e.g. outline In the right medium e.g. access to data in non-literary media

Slide 3

Relevant Technologies web crawlers grouping, steering expectant examination machine interpretation rundown discourse info and yield "… right data" "… right individuals" "… correct time" "… right dialect" "… right level of detail" "… right medium"

Slide 4

"… right data" Search Engines

Slide 5

The Right Information Right Information from future Search Engines How to go past only "pertinence to inquiry" (all) and "fame" Eliminate gigantic repetition e.g. "online email" Should not bring about various connections to various hurray destinations advancing their email, or even non-Yahoo locales talking about just Yahoo-email. Ought to bring about a connection to Yahoo email, one to MSN email, one to Gmail, one that thinks about them, and so on. Initially indicate trusted information sources and client group confirmed sources At slightest for critical data (therapeutic, monetary, instructive, … ), I need to trust what I read, e.g., For new medicinal medications First data from doctor's facilities, restorative schools, the AMA, restorative distributions, and so on , and NOT from Joe Shmo's quack practice page or from the National Enquirer. Greatest Marginal Relevance Novelty Detection Named Entity Extraction

Slide 6

Beyond Pure Relevance in IR Current Information Retrieval Technology Only Maximizes Relevance to Query What about data curiosity, auspiciousness, propriety, legitimacy, intelligibility, thickness, medium,... ?? Oddity is approximated by non-repetition! we truly need to boost: pertinence to the inquiry, given the client profile and connection history, P(U(f i , ..., f n ) | Q & {C} & U & H) where Q = question, {C} = gathering set, U = client profile, H = communication history ...yet we don't yet know how. Darn.

Slide 7

Maximal Marginal Relevance versus Standard Information Retrieval archives inquiry MMR Standard IR

Slide 8

Novelty Detection Find the main report of another occasion (Unconditional) Dissimilarity with Past Decision limit on most-comparable story (Linear) fleeting rot Length-channel (for secrets) Cosine comparability with standard weights:

Slide 9

New First Story Detection Directions Topic-restrictive models e.g. "plane," "examination," "FAA," "FBI," "losses," ��  subject, not occasion "TWA 800," "Walk 12, 1997" ��  occasion First sort into theme, then utilize maximally-discriminative terms inside point Rely on arranged named substances e.g. "Arcan as casualty ," "Sharon as peacemaker "

Slide 10

Link Detection in Texts Find content (e.g. Newstories) that say the same fundamental occasions. Could be consolidated with curiosity (e.g. something new about fascinating occasion.) Techniques: content similitude, NE's, arranged NE's, relations, subject adapted models, …

Slide 11

Named-Entity ID Purpose: to answer inquiries, for example, Who is said in these 100 Society articles? What areas are recorded in these 2000 pages? What organizations are specified in these patent applications? What items were assessed by Consumer Reports this year?

Slide 12

Named Entity Identification President Clinton chose to send uncommon exchange agent Mickey Kantor to the extraordinary Asian monetary meeting in Singapore this week. Ms. Xuemei Peng, exchange serve from China, and Mr. Hideto Suzuki from Japan's Ministry of Trade and Industry will likewise go to. Singapore, who is facilitating the meeting, will most likely be spoken to by its outside and financial clergymen. The Australian agent, Mr. Langford, won't go to, however no reason has been given. The gatherings plan to achieve a structure for money adjustment.

Slide 13

Methods for NE Extraction Finite-State Transducers w/factors Example yield: FNAME: "Charge" LNAME: "Clinton" TITLE: "President " FSTs Learned from named information Statistical adapting (additionally from named information) Hidden Markov Models (HMMs) Exponential (most extreme entropy) models Conditional Random Fields [Lafferty et al]

Slide 14

Named Entity Identification Extracted Named Entities (NEs) People Places President Clinton Singapore Mickey Kantor Japan Ms. Xuemei Peng China Mr. Hideto Suzuki Australia Mr. Langford

Slide 15

Role Situated NE's Motivation: It is valuable to know parts of NE's: Who taken an interest in the financial meeting? Who facilitated the financial meeting? Who was talked about in the monetary meeting? Who was missing from the financial meeting?

Slide 16

Emerging Methods for Extracting Relations Link Parsers at Clause Level Based on reliance linguistic uses Probabilistic upgrades [Lafferty, Venable] Island-Driven Parsers GLR* [Lavie], Chart [Nyberg, Placeway], LC-Flex [Rose'] Tree-bank-prepared probabilistic CF parsers [IBM, Collins] Herald the arrival of deep(er) NLP strategies. Pertinent to new Q/A from free-content activity. Excessively perplexing for inductive adapting (today).

Slide 17

Relational NE Extraction Example: (Who does What to Whom) "John Snell reporting for Wall Street. Today Flexicon Inc. reported a delicate offer for Supplyhouse Ltd. for $30 per share, speaking to a 30% premium over Friday's end cost. Flexicon hopes to get Supplyhouse by Q4 2001 without issues from government regulators"

Slide 18

Fact Extraction Application Useful for social DB filling, to get ready information for "standard" DM/machine-learning techniques Acquirer Acquiree Sh.price Year __________________________________ Flexicon Logi-truck 18 1999 Flexicon Supplyhouse 30 2001 10 2000 ... ... ... ...

Slide 19

"… right individuals" Text Categorization

Slide 20

The Right People User-centered inquiry is vital If a 7-year old is dealing with a school extend taking great care of one's heart and sorts in "heart mind", she will need connections to pages like "You and your benevolent heart", "Tips for taking great care of your heart", "Introduction to how the heart works" and so forth. NOT the most recent New England Journal of Medicine article on "Cardiological ramifications of immuo-dynamic proteases". In the event that a cardiologist issues the question, precisely the inverse is fancied Search motors must know their clients better, and the client assignments Social connection bunches for scan and for consequently arranging, organizing and directing approaching data or list items. New machine learning innovation takes into account adaptable high-exactness various leveled classification. Family aggregate Organization amass Country gather Disaster influenced assemble Stockholder bunch

Slide 21

Text Categorization Assign names to every report or website page Labels might be themes, for example, Yahoo-classes fund, sports, News �� W orld �� A sia �� B usiness Labels might be kinds articles, film audits, news Labels might defeat codes send to advertising, send to client benefit

Slide 22

Text Categorization Methods Manual task as in Yahoo Hand-coded controls as in Reuters Machine Learning (prevailing worldview) Words in content get to be indicators Category marks get to be "to be anticipated" Predictor-highlight lessening (SVD,  2 , … ) Apply any inductive strategy: kNN, NB, DT,…

Slide 23

Multi-level Event Classification

Slide 24

"… right time period" Just-in-Time - no at some point or another

Slide 25

Just in Time Information Get the data to client precisely when it is required Immediately when the data is asked for Prepositioned in the event that it obliges time to bring & download (eg HDTV video) requires expectant examination and pre-getting How about "push innovation" for, e.g. stock alarms, updates, breaking news? Relies on upon client action: Sleeping or Don't Disturb or in Meeting ��  hold up your shot Reading email ��  now if information is dire, later generally Group data before conveying (e.g. indicate 3 stock cautions together) Info specifically applicable to client's present errand ��  promptly

Slide 26

"… right dialect" Translation

Slide 27

Access to Multilingual Information Language Identification (from content, discourse, penmanship) Trans-lingual recovery (inquiry in 1 dialect, brings about numerous dialects) Requires more than question word outside of any relevant connection to the issue at hand interpretation (see Carbonell et al 1997 IJCAI paper) to do it well Full interpretation (e.g. of site page, of list items bits, … ) General perusing quality (as focused now) Focused on getting substances right (who, what, where, when said) Partial on-request interpretation Reading aide: interpretation in setting while perusing a unique record, by highlighting new words, phrases, entries. On-request Text to Speech Transliteration

Slide 28

Knowledge-Engineered MT Transfer lead MT (business frameworks) High-Accuracy Interlingual MT (area centered) Parallel Corpus-Trainable MT Statistical MT (uproarious channel, exponential models) Example-Based MT (summed up G-EBMT) Transfer-run learning MT (corpus & witnesses) Multi-Engine MT Omnivorous approach: joins the above to expand scope & minimize mistakes "… in the Right Language"

Slide 29

Types of Machine Translation Interlingua Semantic Analysis Sentence Planning Transfer Rules Syntactic Parsing Text Generation Source (Arabic) Target (English) Direct: EBMT

Slide 30

EBMT case English: I might want to meet her. Mapudungun: Ayükefun trawüael fey engu. En