Common Language Processing

2537 days ago, 935 views
PowerPoint PPT Presentation
Characteristic Dialect Preparing. Julia Hirschberg COMS 4705 Fall 2010. What is Regular Dialect Handling?. Programming that can perceive, examine and produce content and discourse Otherwise known as computational phonetics At Columbia: Michael Collins, CS, parsing, machine interpretation

Presentation Transcript

Slide 1

Characteristic Language Processing Julia Hirschberg COMS 4705 Fall 2010 CS 4705

Slide 2

What is Natural Language Processing? Programming that can perceive, break down and produce content and discourse AKA computational phonetics At Columbia: Michael Collins, CS, parsing, machine interpretation Mona Diab, CCLS, semantics Nizar Habash, CCLS, morphology, machine interpretation Julia Hirschberg, CS, talked dialect preparing Kathy McKeown, CS, synopsis, era Becky Passonneau, CCLS, exchange frameworks, reference determination Owen Rambow, CCLS, sentence structure, parsing

Slide 3

Why is NLP hard? A few Headlines… Something Went Wrong In Jet Crash, Expert Says Police Begin Campaign To Run Down Jaywalkers Drunk Gets Nine Months In Violin Case Farmer Bill Dies In House Iraqi Head Seeks Arms Enraged Cow Injures Farmer With Ax Stud Tires Out Eye Drops Off Shelf Teacher Strikes Idle Kids Squad Helps Dog Bite Victim

Slide 4

What will we find out about in this course? Morphology: the way words are shaped Syntax: the way words are gathered together into bigger constituents and phrases and the way these expressions can be requested Semantics: the setting autonomous "signifying" of articulations Pragmatics: the setting subordinate "signifying" of articulations Goal: What is a speaker/author intending to pass on?

Slide 5

Morphology Stud tires out : Is ` stud " a descriptive word or a thing? `tires': a thing or a verb? Web seek: ` union exercises in New York " What to search for? Union/unions; exercises/movement Active? Activity? Performing artist? Real? Scholastic? New versus New York, York versus yorkie

Slide 6

Syntax Constituent Structure: Teacher Strikes Idle Kids Enraged Cow Injures Farmer With Ax Word Order and Position and Meaning John hit Bill. Bill was hit by John. Charge, John hit. Who John hit was Bill. I said John hit Bill. John hits Bill.

Slide 7

Semantics Word meaning – semantic parts John grabbed an awful cool. John grabbed an extensive shake. John grabbed Radio Netherlands on his radio. Is meaning compositional? Squad canines chomp casualty Enraged cow harms rancher with hatchet

Slide 8

Pragmatics Going Home , a play in one act (on account of Bonnie Dorr) Scene 1: Pennsylvania Station, NY Bonnie: Long Beach? Bystander: Downstairs, LIRR Station. Scene 2: Ticket Counter, LIRR Station Bonnie: Long Beach? Agent: $4.50.

Slide 9

Scene 3: Information Booth, LIRR Station Bonnie: Long Beach? Representative: 4:19, Track 17. Scene 4: On the prepare, region of Forest Hills Bonnie: Long Beach? Conductor: Change at Jamaica. Scene 5: On the following train, region of Lynbrook Bonnie: Long Beach? Conductor: Right after Island Park.

Slide 10

Algorithms Rule-based Symbolic Parsers and morphological analyzers Finite state automata Probabilistic/measurable Learned from perception of (marked) information Predicting new information in light of old Machine learning

Slide 11

Current Real-World Applications Search : extensive corpora, e.g. Google Question noting : e.g. IBM's Jeopardy!, DARPA who/what/where… , Ask Jeeves Translating between one dialect and another: e.g. Google Translate, Babelfish Summarizing a lot of content or discourse: e.g. your email, the news, voice message Sentiment investigation : eatery or film audits Dialog frameworks : e.g. Amtrak's "Julie"

Slide 12

Instructor Julia Hirschberg CEPSR 705, Focus: Spoken Language Processing Lab: The Speech Lab , CEPSR 7LW3-A Research: Deceptive discourse Charismatic discourse: Emotional discourse : outrage, instability Speech rundown : Broadcast News Spoken Dialog Systems : Games Corpus ` Translating Prosody ': English – Mandarin Text2Scene Synthesis

Slide 13

Course Details Teaching Assistants: Mohamed Altantawy Email: Office Hours: CEPSR 7LW1 (Speech Lab), W 5-6, Th 5:30-6:30 Will oversee CVN course Wei Yun Ma Email: Office Hours: CEPSR 725, Tu 10-12

Slide 14

Text: Daniel Jurafsky and James H. Martin, Speech and Language Processing , second version Note errata accessible on site Check courseworks for extra data on class, homework assignments, posting questions Assignments: 3 homework assignments: Question-replying, content grouping, delightful amazement Midterm and end of the year tests Five "free" late days for homeworks - after that 10% off per late day– not usable on HW1 however You will require a CS account

Slide 15

Recorded Lecture Availability For on-grounds understudies On CVN site

Slide 16

Grading HW1: 10% Hw2: 20% Hw3: 20% Midterm: 15% Final: 25% Class interest: 10%

Slide 17

Academic Integrity Copying or summarizing somebody's work (code included), or allowing your own particular work to be replicated or reworded, regardless of the possibility that exclusive to some extent, is taboo, and will bring about a programmed review of 0 for the whole task or exam in which the duplicating or rewording was finished. Your review ought to mirror your own work. On the off chance that you will experience difficulty finishing a task, converse with the educator or TA ahead of time of the due date please. Everybody: Read/compose ensure your homework records at all circumstances.

Slide 18

For Next Class Look at syllabus – make inquiries about anything you don't comprehend Read Chapters 1-2 of J&M