Adaptable Knowledge Extraction from Legacy Sources with SEEK

Scalable knowledge extraction from legacy sources with seek l.jpg
1 / 22
1410 days ago, 492 views
PowerPoint PPT Presentation

Presentation Transcript

Slide 1

Adaptable Knowledge Extraction from Legacy Sources with SEEK Joachim Hammer Dept. of CISE University of Florida 3-June-2003

Slide 2

Outline of Talk Motivation SEEK Information Architecture Knowledge Extraction Schema Extraction Code Analysis Status and Future Work

Slide 3

SEEK Project Faculty Joachim Hammer Mark Schmalz William O'Brien Ray Issa Joe Geunes Sherman Bai Current Students Oguzhan Topsakal Mingxi Wu Ivan Mutis Haiyan Xie Bibo Yang Bo Lu Computer Science Building Construction Industrial Engineering Sponsored by NSF Year 2 of 4

Slide 4

Motivation Need for incorporated access to insight sources in support of national security related applications Ability to quickly filter through gigantic volumes of information for situational investigation and arranging Hard … sources have special and frequently contrary data frameworks, fluctuating levels of refinement E.g., Web, open records, sensor systems, state and government databases, and so forth. Current combination approaches depend on manual coding of association programming - Not versatile Development of a toolbox to encourage mix of heterogeneous legacy information and learning

Slide 5

open/data get to office 1 office 2 outside information sources Information Environment reporting power/shared investigation Many PCs, numerous clients, numerous data needs

Slide 6

Application Areas Homeland Defense Threat expectation and recognition Emergency Management Emergency reaction arranging, harm appraisal Extended Enterprise/Supply Network Decision/arrangement support to enhance execution and customization

Slide 7

SEEK Environment & Context organizer/lead Agency … SEEK Decision Support/Analysis Secure Hosting Infrastructure

Slide 8

SEEK Information Architecture Connection Toolkit Legacy Source SEEK Components Domain Expert Application K nowledge E xtraction M odule A nalysis M odule Legacy Data and Systems Secure, esteem included extraction of source information W rapper AM: query examination, information sythesis (intercession) W: source association and interpretation KEM: configuration of W and AM at assemble time

Slide 9

Run-Time: Querying and Analysis Different data settings: application, investigation module, source Translator expected to change over between data settings Assume presence of interpreter amongst AM and application settings Analysis module gives strong (esteem included) intervention Solution procedure in light of data accessible in source Capable of making last reply out of various source comes about SEEK wrapper in charge of syntactic and semantic transformations Formulates source questions in view of capacities of source Restructures source results to fit in with data setting of AM

Slide 10

Build-Time: Knowledge Extraction Extract data about legacy source to encourage advancement of wrapper and setup of AM Produces "portrayal" of available learning in source Schema extraction from information source Analysis of use code to enlarge outline with semantics and concentrate business rules Schema Matching to construe mappings between data setting of AM/application with that of legacy source Quality and exactness of removed learning (and consequently the wrapper and AM) enhances after some time and with human information

Slide 11

Architectural Overview Domain Model Domain Ontology Data Reverse Engineering (DRE) reexamine, approve Schema Information Schema Extractor (SE) Semantic Analyzer (SA) Embedded Queries prepare, approve Schema Matching (SM) Schema, semantics business rules Legacy Application Code Legacy DB to wrapper generator Mapping rules Legacy Source

Slide 12

Schema Extraction Based on information figuring out calculations, e.g., Chiang 94/95, Petit et al. 96 Reduced reliance on human info Eliminated impediments (e.g., predictable naming, legacy mapping in 3-NF) Use database index to specifically extricate ideas and basic requirements Use database occurrences to gather connections and imperatives Interact with code investigation to increase outline with semantics Produces E/R-like representation of the substances, connections, and limitations

Slide 13

Semantic Analysis Identify semantic depictions for construction things in database in application code E.g., follow database diagram names back to yield explanations Using code cutting to lessen application code to just those announcements that are important to the analyzer (Horwitz, Reps 92) Apply design matcher find relationship among factors distinguish designs that encode business data E.g., business rules encoded in IF-THEN-ELSE proclamations Versions for C, C++, and Java

Slide 14

DRE Implementation Legacy Source Application Code DB Interface Module Data setup AST Generation 1 Dictionary Extraction 2 Queries AST Code Analysis 3 Metadata Repository Inclusion Dependency Mining 4 Business Knowledge Relation Classification 5 Schema Attribute Classification 6 Knowledge Encoder XML DTD Entity Identification 7 XML DOC Relationship Classification 8 To Schema Matcher

Slide 15


Slide 16

Scheduling Application/* program for undertaking booking */roast *aValue ; scorch *cValue ; int bValue = 0;/* more code … */EXEC SQL SELECT T_ST_D , T_FIN_D INTO :aValue , :cValue FROM T WHERE T_PRITY = :bValue;/* more code … */int signal = 0; IF ( cValue <= aValue ) { flag = 1;/* exemption taking care of */}/* more code … */printf (" Task Start Date %d", aValue ); printf (" Task Finish Date %d", cValue );/* more code … */

Slide 17

Extracted Conceptual Schema Proj_ID P_Name P_ID Des_S Res_UID N has Proj Res 1 Res_Name N has Assn has Res_ID N M N T Avail Avail_UID Proj_ID T_ID Proj_ID T_UID

Slide 18

Result of Code Analysis

Slide 19

Extracted Business Rules Variables have been supplanted by their separated significance (to the degree that they are known)

Slide 20

Current Status & Future Research Current Implemented intuitive information extraction model comprising of SE and SA (store network & development spaces) Developing diagram coordinating module Application of SEEK toolbox to crisis reaction framework Data gathering in collaboration with City of Gainesville Fire & Rescue Application to administration of EOC arranged Future Development of investigation module Enhance DRE with capacity to enhance with time and utilization cases

Slide 21

Summary and Conclusion SEEK is an organized way to deal with incorporating area particular legacy sources Modular engineering gives a few essential abilities (Semi)automatic learning extraction DRE, semantic examination, construction coordinating Important commitments to hypothesis of learning catch and joining Requirement for building adaptable sharing designs Enabling innovation for (semi)automatic metaphysics creation Enabler for Semantic Web?

Slide 22

More Info M. S. Schmalz, J. Pound, M. Wu, and O. Topsakal, "EITH - A bringing together representation for database blueprint and application code in big business information extraction." To be exhibited at 22nd International Conference on Conceptual Modeling (ER 2003), Chicago, IL, 2003. "Versatile Extraction of Enterprise Knowledge." Conditionally acknowledged for distribution in Research Frontiers in Supply Chain Management and E-Commerce , E. Akcaly, J. Geunes, P.M. Pardalos, H.E.Romeijn, and Z.J. Shen, (eds). Kluwer Science Series in Applied Optimization. (acknowledged for production in 2004.) "Looking for learning in legacy data frameworks to bolster interoperability." ECAI-02 Workshop on Ontologies and Semantic Interoperability , Lyon, France, July 21-26, 2002. "Look for: achieving undertaking data incorporation crosswise over heterogeneous sources," ITCON – Journal of Information Technology in Construction – Special Edition on Knowledge Management , 7, pp. 101-124, 2002. "Vigorous intervention of inventory network data." ASCE Specialty Conference on Fully Integrated and Automated Project Processes (FIAPP) in Civil Engineering , Blacksburg, VA, September 26-28, 2001, 415-425. Site: for/