Building System

1610 days ago, 618 views
PowerPoint PPT Presentation
Chinese. 20 Indian Languages. Brazilian Portuguese. Hebrew. Latvian. Persian. Kurdish. Avestan ... soothsaying. 0.277% 314. mythology. 0.131% 148. artisanship. 0.869% 985 ...

Presentation Transcript

Slide 1

Building Methodology © Arabic WordNet

Slide 2

Methodologies created in various activities EuroWordNet: English, Dutch, German, French, Spanish, Italian, Czech, Estonian 10,000 up to 50,000 synsets BalkaNet: Romanian, Bulgarian, Turkish, Slovenian, Greek, Serbian 10,000 synsets

Slide 3

Main procedures for building wordnets Expand approach : make an interpretation of WordNet synsets to another dialect and assume control over the structure simpler and more proficient strategy perfect structure with WordNet vocabulary and structure is near WordNet additionally one-sided by it Merge approach : make a free wordnet in another dialect and adjust it to WordNet by producing the suitable interpretations more perplexing and work serious distinctive structure from WordNet dialect particular examples can be looked after

Slide 4

General criteria for approach: The reason for the asset : machine interpretation, cross-lingual data recovery, profound semantic investigation, space applications Available assets for the particular dialect Properties of the dialect Maximize the cover with wordnets for different dialects Maximize semantic consistency inside and crosswise over wordnets Maximally center the manual exertion where required Maximally misuse programmed systems

Slide 5

Top-down philosophy Develop a center wordnet (5,000 synsets) : all the semantic building pieces or establishment to characterize the relations for all other more particular synsets, e.g. building - > house, church, school give a formal and unequivocal semantics Validate the center wordnet: does it incorporate the most regular words? are semantic limitations disregarded? Amplify the center wordnet: (5,000 synsets or more) : programmed procedures for more particular ideas with high-certainty comes about include different levels of hyponymy include particular spaces include "simple" derivational words include "simple" interpretation equality Validate the total wordnet

Slide 6

Developing a center wordnet Define an arrangement of concepts(so-called Base Concepts) that assume an essential part in wordnets: high position in the progressive system high level of availability spoke to as English WordNet synsets Common base ideas: shared by different wordnets in various dialects Local base ideas: not shared EuroWordNet: 1024 synsets, shared by at least 2 dialects BalkaNet: 5000 synsets (counting 1024) Common semantic structure for every Base Concept, as a Top-Ontology Manually decipher every Base Concept (English Wordnet synsets) to synsets in the neighborhood dialects (was connected for 13 Wordnets) Manually assemble and check the hypernym relations for the Base Concepts All 13 Wordnets are produced from a comparative semantic center firmly identified with the English Wordnet

Slide 7

Top-down approach Top-Ontology 63TCs Hypero nyms Hypero nyms CBC Represen-tatives Local BCs 1024 CBCs CBC Repre-senta. Nearby BCs WMs related through non-hypo nymy WMs related by means of non-hypo nymy Remaining WordNet1.5 Synsets First Level Hyponyms First Level Hyponyms Remaining Hyponyms Remaining Hyponyms Inter-Lingual-Index

Slide 8

Global Wordnet Association EuroWordNet BalkaNet Arabic Polish Welsh Chinese 20 Indian Languages Brazilian Portuguese Hebrew Latvian Persian Kurdish Avestan Baluchi Hungarian Romanian Bulgarian Turkish Slovenian Greek Serbian English German Spanish French Italian Dutch Czech Estonian Danish Swedish Portuguese Korean Russian Basque Catalan Thai

Slide 9

Core wordnet 5000 synsets = 1000 Synsets 5000 Synsets WordNet Synsets 1045678-v {darrasa} Top-down philosophy Hyper nyms Sumo Ontology Arabic word recurrence English Arabic Lexicon educate - darrasa CBC SBC ABC EuroWordNet BalkaNet Base Concepts WordNet Synsets 1045678-v {teach} Next Level Hyponyms Arabic roots & deduction rules WordNet Synsets WordNet Domains More Hyponyms Domain "chemics" WordNet Synsets Named Entities Named Entities Easy Translations Domain Arabic Wordnet English Wordnet

Slide 10

Advantages of the approach Well-characterized semantics that can be acquired down to more particular ideas Apply consistency checks Automatic strategies can utilize semantic premise Most regular ideas and words are secured High cover and similarity with different wordnets Manual exertion is focussed on the most troublesome ideas and words

Slide 11

Distribution over the top metaphysics bunches

Slide 14

Overview of equality relations to the ILI Relation POS Sources: Targets Example eq_synonym same 1:1 auto : voiture car eq_near_synonym any many : many apparaat, machine, toestel: apparatus, machine, device eq_hyperonym same many : 1 (usually) citroenjenever: gin eq_hyponym same (usually) 1 : many dedo : toe, finger eq_metonymy same many/1 : 1 universiteit, universiteitsgebouw: university eq_diathesis same many/1 : 1 raken (cause), raken: hit eq_generalization same many/1 : 1 schoonmaken : clean

Slide 15

Filling holes in the ILI Types of GAPS bona fide, social crevices for things not known in English culture, e.g. citroenjenever , which is a sort of gin made out of lemon skin, Non-beneficial Non-compositional even minded, as in the idea is known yet is not communicated by a solitary lexicalized frame in English, e.g.: compartment, borrower, cajera (female clerk) Productive Compositional Universality of crevices: Concepts happening in no less than 2 dialects

Slide 16

Productive and Predictable Lexicalizations thoroughly connected to the ILI beat hypernym {doodslaan V } NL {totschlagen V } DE execute hypernym {doodstampen V } NL {tottrampeln V } DE stamp hypernym {doodschoppen V } NL kick clerk hypernym {cajera N } ES in_state {casière} NL in_state female hypernym angle {alevín N } ES in_state youthful

Slide 17

Top-down system Hyper nyms Sumo Ontology = Arabic word recurrence English Arabic Lexicon 1000 Synsets SBC CBC ABC EuroWordNet BalkaNet Base Concepts 5000 Synsets Next Level Hyponyms Arabic roots & deduction rules WordNet Synsets WordNet Domains More Hyponyms Domain "chemics" WordNet Synsets Named Entities Named Entities Easy Translations Domain Arabic Wordnet English Wordnet