Situational Business Intelligence

Situational business intelligence l.jpg
1 / 44
0
0
1244 days ago, 453 views
PowerPoint PPT Presentation
Plan. Customary Business IntelligenceNext Generation Business Intelligence Building BlocksCloud Computing, Map-Reduce, and Hadoop, Piglatin UIMA, Social TaggingThe Long Tail of Situational ApplicationsSituational Business IntelligenceChallenges. Customary Business Intelligence. How Did We Get Here?.

Presentation Transcript

Slide 1

Situational Business Intelligence Volker Markl Technische Universität Berlin

Slide 2

Agenda Traditional Business Intelligence Next Generation Business Intelligence Building Blocks Cloud Computing, Map-Reduce , and Hadoop , Piglatin UIMA, Social Tagging The Long Tail of Situational Applications Situational Business Intelligence Challenges

Slide 3

Traditional Business Intelligence

Slide 4

How Did We Get Here? Real and guage BI apparatuses programming income as announced by IDC BI over Text Web empowered Business Intelligence Client Server Business Intelligence Query/Reporting OLAP Batch Reporting Source: IDC Source: Gartner

Slide 5

2008 CIO Priorities 1 2 3 4 5 6 7 8 9 10 2008 CIO Technology Priorities Rank 2008 Rank 2007 Rank 2006 2008 Increase* To what degree will each of the accompanying advances be a Top 5 need for you in 2008? 11.20% Business Intelligence Applications Enterprise Applications (ERP, SCM, and CRM) Server and Storage Technologies (Virtualization) Legacy Application Modernization Security Technologies Technical Infrastructure Networking, Voice, and Data Communications (VoIP) Collaboration Technologies Document Management Service-Oriented Technologies (SOA and SOBA) 1 2 5 3 6 8 4 10 9 7 1 ** 9 10 2 12 8 4 ** 6 8.02% 8.45% 5.79% 8.53% 4.67% 6.83% 7.75% 7.91% 6.71% * Unweighted normal spending change Source: 2008 Gartner Executive Programs CIO Survey, January 10, 2008 ** New question for 2007

Slide 6

What are CIOs missing? Better/more data 22.9% Faster/speedy recovery 14.3% Accurate/refreshed information 11.4% Consistent stage 8.6% Better reconciliation 8.6% Standardization 8.6% Other single notices 40.0% Please give me a case of how your business insight arrangement could better meet your associations fundamental target? Source: Business Intelligence Survey, IDC

Slide 7

Next Generation Business Intelligence Internet Text Who is driving in American Idol? Intranet Information Extraction Semantic Integration Load/Refresh or specially appointed Text XLS Who are the greatest players in the Linux showcase ? Investigation Schema and Entities Text XML Which protection strategy clients are at danger of being hit by a present tempest ? Information Warehouse Data Marts The up and coming era of Business Intelligence (NGBI) relates information stockrooms with content and semi-organized information from webservices of corporate intranets and the web

Slide 8

Answering a NGBI Query Who are the greatest players in the "Linux" advertise? Web 2.0 reports from 332 Wiki News docs (January –March 2007)

Slide 9

Data Source Identification Data Warehouse Masterdata Information Providers Information Marketplaces Crawling (Internet/Intranet) Data Fusion Atomic Entity extraction Data Cleansing Schema extraction Data Source ID

Slide 10

Atomic Entity Extraction Data Fusion Atomic Entity extraction Data Cleansing Schema extraction Data Source distinguishing proof Out-of-the case information Web Services for mind boggling, nuclear and named substances Frameworks Infrastructures for removing, overseeing and versatile capacity of named elements Web Services for separating named elements Basic Components Screen scrubber Additional extraction and information purifying exertion

Slide 11

Ad hoc investigation prepare Data Fusion Atomic Entity extraction Data Cleansing Schema extraction Data Source ID

Slide 12

Schema Extraction Data Fusion Base extraction Data Cleansing Schema extraction Pre Process Company Technology - >Technology Company Technology - > Company

Slide 13

Data Cleansing Data Fusion Base extraction Data Cleansing Schema extraction Pre Process Duplicates

Slide 14

Data Fusion Data Fusion Base extraction Data Cleansing Schema extraction Pre Process Data Source A Schema Mapping Apple iPhone 3 Gen 299.95 Information Integration Duplicate Detection coordinate max length min Apple iPhone 3 Gen 199.99 Data Fusion Apple iPhone 3G 199.99 Data Source B e.g., Hummer (U Potsdam)

Slide 15

Data Fusion Data Fusion Base extraction Data Cleansing Schema extraction Pre Process b c - an a b c d Integration of corresponding tuples b d a - b - a - Elemination of indistinguishable tuples b - a b - a b c - an Elemination of subsumed tuples a b c - b - a b c - a Conflict determination f(b,e) ‏ a c d an e - d

Slide 16

Address Uncertainty: Query Refinement Extract->SELECT->PROJECT-JOIN-(COUNT, AVG, SUM, MEAN..) ‏ "Everything" about Dell? The market of "Linux" from 2007-2008? "What's the normal investigator cite about the IBM stock cost for the most recent month?" Drill down on district, time, association … . U QUERY S U DATA

Slide 17

Building Blocks Cloud Computing Map Reduce Pig UIMA Social Tagging

Slide 18

Cloud Computing What is Cloud Computing? Figuring stage engineering Scales to any application High adaptation to non-critical failure No for the most part acknowledged definition accessible Separation from Utility or Grid Computing is not self-evident

Slide 19

Cloud Computing How does Cloud Computing work? Loads of approximately coupled PCs Use of ware equipment Flexible up-or minimizing of assets APIs offer access to distributed computing frameworks Software deals with parallelization, equipment disappointments and mistake taking care of Resources (e.g. capacity, figuring force) can be purchased as administrations (paying for utilization, e.g. Amazon)

Slide 20

MapReduce – Programming Model Program rationale is part into 2 capacities: Map( k,v ) and Reduce( k,list (v)) Functions get and create (Key, Value)- sets Map( k,v ) processes for each ( k,v )- match a middle of the road ( k i ,v i )- combine Reduce( k,list (v)) blends all qualities with a similar key k and yields the outcome. MapReduce projects are anything but difficult to create Frameworks give libraries Frameworks deal with parallelization, conveyance and mistake taking care of Only application particular source code is required (no parallelization and blunder taking care of code)

Slide 21

MapReduce – Group AVG Example Input Data MAP( k,v ) Intermediate (K,V)- Pairs REDUCE( k,list (v)) Result (US,10) (US,40) (US,10) (US,40) (GB,20) NewYork , US, 10 LosAngeles, US, 40 London, GB, 20 Berlin, DE, 60 Glasgow, GB, 10 Munich , DE, 30 … (DE,45) (GB,15) (US,25) (GB,20) (GB,10) (GB,10) (DE,60) (DE,30) (DE,60) (DE,30)

Slide 22

MapReduce Programming Model For preparing of gigantic measures of information Massive parallelization of figuring undertakings Applicable to numerous certifiable applications MapReduce projects are anything but difficult to execute MapReduce Engine Environment to run MapReduce programs Distributes processing errands Errors are straightforwardly dealt with Very adaptable engineering Examples: Google MapReduce & Apache Hadoop

Slide 23

Hadoop What is Hadoop? Free programming structure for information concentrated applications Enables disseminated handling of endless measures of information on distributed computing designs Supports mists with 1000+ hubs Two segments: Hadoop Distributed File System (HDFS) MapReduce Engine Where would you be able to get Hadoop? Best level Apache Project: http://hadoop.apache.org/center/

Slide 24

Hadoop - HDFS Inspired by Google File System Distributed capacity for vast documents Files are part up in various parts (default estimate 64MB) Parts are spread over the HDFS hubs Each part recreated (default 3 times)

Slide 25

Hadoop – MapReduce Engine Runs MapReduce programs Libraries for Java and C++ Assigns Map and Reduce errands to figuring hubs Reduction of information exchange volume Tasks are doled out to hubs holding the information Node disappointments are straightforwardly taken care of Tasks are restarted on hub holding a copy of the information MAP( ) MAP( ) MAP( ) FAILS! MAP( ) TaskManager MAP( ) …

Slide 26

Hadoop Who utilizes Hadoop? Amazon A9.com (Search Index Building, Analytics) Facebook (Logfile Analysis) Google & IBM (University Initiative to Address Internet-Scale Computing Challenges) Yahoo! (Creeping, Indexing, Searching) Yahoo! Hadoop Cluster runs Terabyte Sort Benchmark in 209 seconds And numerous others… (see http://wiki.apache.org/hadoop/PoweredBy) Hadoop looks like Google's MapReduce Framework J. Senior member, S. Ghemawat „MapReduce: Simplified Data Processing on Large Clusters"

Slide 27

The Pig Project A stage for breaking down expansive informational indexes Pig comprises of two sections: PigLatin : A Data Processing Language Pig Infrastructure (Grunt): An Evaluator for PigLatin programs Where would you be able to get Pig? Apache Incubator Project: http://incubator.apache.org/pig Alternatives: HIVE ( Facebook ) JAQL (IBM Research)

Slide 28

PigLatin Data Processing Language PigLatin is basic (though SQL is revelatory) Step-by-step programming approach PigLatin questions are anything but difficult to compose and see Fully nestable information demonstrate Atomic qualities, tuples , sacks, maps Operators of two flavors: Relational style administrators (channel, join, and so on.) Functional-programming style administrators (outline) Easy to stretch out by client capacities Example: "Locate the main 10 most went to pages in every classification" visits = stack '/information/visits' as (client, url , time); gVisits = aggregate visits by url ; visitCounts = foreach gVisits produce url , count(visits); urlInfo = stack '/information/urlInfo " as ( url , classification, pRank ); visitCounts = join visitCounts by url , urlInfo by url ; gCategories = amass visitCounts by classification; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into '/information/topUrls '; Example taken from: " Pig Latin: A Not-So-Foreign Language For Data Processing" Talk, Sigmod 2008

Slide 29

Pig Infrastructure Currently two modes: Local: PigLatin projects are privately assessed (keep running in a solitary JVM) MapReduce : PigLatin projects are incorporated to groupings of MapReduce projects and executed (e.g. on Hadoop ) Example: Map 1 LOAD visits GROUP B

SPONSORS