Friday, August 23, 2002 Haipeng Guo KDD Research Group Department of Computing and Information Sciences Kansas State

2500 days ago, 1114 views
PowerPoint PPT Presentation

Presentation Transcript

Slide 1

KDD Group Seminar Dynamic Bayesian Networks Friday, August 23, 2002 Haipeng Guo KDD Research Group Department of Computing and Information Sciences Kansas State University

Slide 2

Presentation Outline Introduction to State-space Models Dynamic Bayesian Networks(DBN) Representation Inference Learning Summary Reference

Slide 3

The Problem of Modeling Sequential Data Sequential Data Modeling is essential in numerous regions Time arrangement created by a dynamic framework Time arrangement demonstrating A succession produced by an one-dimensional spatial process Bio-groupings

Slide 4

The Solutions Classic ways to deal with time-arrangement expectation Linear models: ARIMA(auto-backward coordinated moving normal), ARMAX(autoregressive moving normal exogenous factors show) Nonlinear models: neural systems, choice trees Problems with exemplary methodologies forecast without bounds depends on just a limited window it's hard to fuse earlier knowledge  difficulties with multi-dimentional inputs or potentially yields State-space models Assume there is some basic concealed condition of the world(query) that creates the observations(evidence), and this shrouded state develops in time, conceivably as an element of our sources of info The conviction express: our conviction on the shrouded condition of the world surrendered the perceptions to the ebb and flow time y 1:t and our sources of info u 1:t to the framework, P( X | y 1:t , u 1:t ) Two most basic state-space models: Hidden Markov Models(HMMs) and Kalman Filter Models(KFMs) a more broad state-space display: dynamic Bayesian networks(DBNs)

Slide 5

State-space Models: Representation Any state-space show must characterize an earlier P(X 1 ) and a state-move work, P(X t | X t-1 ) , and a perception work, P(Y t | X t ). Presumptions: Models are first-arrange Markov, i.e., P(X t | X 1:t-1 ) = P(X t | X t-1 ) perceptions are restrictive first-arrange Markov P(Y t | X t , Y t-1 ) = P(Y t | X t ) Time-invariant or homogeneous Representations: HMMs: X t is a discrete arbitrary factors KFMs: X t is a vector of persistent irregular factors DBNs: more broad and expressive dialect for speaking to state-space models

Slide 6

State-space Models: Inference A state-space demonstrate characterizes how X t creates Y t and X t. The objective of deduction is to induce the concealed states(query) X 1:t given the observations(evidence) Y 1:t . Derivation errands: Filtering(monitoring): recursively evaluate the conviction state utilizing Bayes' administer anticipate: processing P( X t | y 1:t-1 ) overhauling: figuring P( X t | y 1:t ) discard the old conviction state once we have registered the prediction("rollup") Smoothing: appraise the condition of the past, surrendered all the confirmation to the present time Fixed-slack smoothing(hindsight): processing P( X t-l | y 1:t ) where l > 0 is the slack Prediction: foresee the future Lookahead: processing P( X t+h | y 1:t ) where h > 0 is the way far we need to look ahead Viterbi deciphering: process the no doubt arrangement of shrouded states given the information MPE(abduction): x * 1:t = argmax P( x 1:t | y 1:t )

Slide 7

State-space Models: Learning Parameters learning(system recognizable proof) implies assessing from information these parameters that are utilized to characterize the move show P( X t | X t-1 ) and the perception demonstrate P( Y t | X t ) The typical foundation is most extreme likelihood(ML) The objective of parameter learning is to figure  * ML = argmax  P( Y|  ) = argmax  log P( Y|  ) Or  * MAP = argmax  log P( Y|  ) + logP(  ) on the off chance that we incorporate an earlier on the parameters Two standard methodologies: angle rising and EM(Expectation Maximization) Structure adapting: more eager

Slide 8

X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 HMM: Hidden Markov Model one discrete concealed hub and one discrete or ceaseless watched hub per time cut. X: concealed factors Y: perceptions Structures and parameters stay same after some time Three parameters in a HMM: The underlying state dissemination P( X 1 ) The move display P( X t | X t-1 ) The perception show P( Y t | X t ) HMM is the least complex DBN a discrete state variable with discretionary progression and self-assertive estimations

Slide 9

X 1 X 2 Y 1 Y 2 KFL: Kalman Filter Model KFL has an indistinguishable topology from a HMM every one of the hubs are expected to have straight Gaussian dispersions x(t+1) = F*x(t) + w(t), - w ~ N(0, Q) : prepare clamor, x(0) ~ N(X(0), V(0)) y(t) = H*x(t) + v(t), - v ~ N(0, R) : estimation commotion Also known as Linear Dynamic Systems(LDSs) a somewhat watched stochastic process with direct elements and straight perceptions: f( a + b) = f(a) + f(b) both subject to Gaussian commotion KFL is the easiest persistent DBN a constant state variable with direct Gaussian flow and estimations

Slide 10

DBN: Dynamic Bayesian systems DBNs are coordinated graphical models of stochastic process DBNs sum up HMMs and KFLs by speaking to the covered up and watched state as far as state factors, which can have complex interdependencies The graphical structure gives a simple approach to determine these contingent independencies A minimal parameterization of the state-space demonstrate An augmentation of BNs to handle fleeting models Time-invariant: the expression "dynamic" implies that we are demonstrating a dynamic model, not that the systems change after some time

Slide 11

DBN: a formal Definition: A DBN is characterized as a couple (B 0 , B  ), where B 0 characterizes the earlier P(Z 1 ), and is a two-cut transient Bayes net(2TBN) which characterizes P(Z t | Z t-1 ) by method for a DAG(directed non-cyclic diagram) as takes after: Z( i,t ) is a hub at time cut t, it can be a shrouded hub, a perception hub, or a control node(optional) Pa(Z ( i, t) ) are parent hubs of Z(i,t), they can be at either time cut t or t-1 The hubs in the principal cut of a 2TBN don't have parameters connected with them But every hub in the second cut has a related CPD(conditional likelihood appropriation)

Slide 12

DBN representation in BNT(MatLab) To indicate a DBN, we have to characterize the intra-cut topology (inside a cut), the between cut topology (between two cuts), and also the parameters for the initial two cuts . (Such a two-cut worldly Bayes net is regularly called a 2TBN.) We can indicate the topology as takes after: intra = zeros(2); intra(1,2) = 1;/hub 1 in cut t interfaces with hub 2 in cut t entomb = zeros(2); inter(1,1) = 1;/hub 1 in cut t-1 associates with hub 1 in cut t We can determine the parameters as takes after, where for straightforwardness we expect the watched hub is discrete. Q = 2;/num concealed states O = 2;/num noticeable images ns = [Q O]; dnodes = 1:2; bnet = mk_dbn(intra, entomb, ns, 'discrete', dnodes); for i=1:4 bnet.CPD{i} = tabular_CPD(bnet, i); end eclass1 = [1 2]; eclass2 = [3 2]; eclass = [eclass1 eclass2]; bnet = mk_dbn(intra, bury, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2); prior0 = normalise(rand(Q,1)); transmat0 = mk_stochastic(rand(Q,Q)); obsmat0 = mk_stochastic(rand(Q,O)); bnet.CPD{1} = tabular_CPD(bnet, 1, prior0); bnet.CPD{2} = tabular_CPD(bnet, 2, obsmat0); bnet.CPD{3} = tabular_CPD(bnet, 3, transmat0);

Slide 13

Representation of DBN in XML design <dbn> <prior>/… a static BN(DAG) in XMLBIF arrange characterizing the/state-space at time cut 1 </prior> <transition>/a move network(DAG) including two time cuts t and t+1;/hub has an extra trait indicating which time cut it/has a place with/just hubs in cut t+1 have CPDs </transition> </dbn>

Slide 14

The Semantics of a DBN First-arrange markov supposition: the guardians of a hub must be in a similar time cut or the past time cut, i.e., bends don't crosswise over cuts Inter-cut curves are all from left to right, mirroring the bolt of time Intra-cut circular segments can be discretionary the length of the general DBN is a DAG Time-invariant suspicion: the parameters of the CPDs don't change after some time The semantics of DBN can be characterized by "unrolling" the 2TBN to T time cuts The subsequent joint likelihood dissemination is then characterized by

Slide 15

DBN, HMM, and KFM HMM's state space comprises of a solitary irregular variable; DBN speaks to the shrouded state regarding an arrangement of irregular factors KFM requires all the CPDs to be direct Gaussian; DBN permits subjective CPDs HMMs and KFMs have a limited topology; DBN permits a great deal more broad chart structures DBN sums up HMM and KFM; has more expressive power

Slide 16

DBN: Inference The objective of derivation in DBNs is to figure Filtering: r = t Smoothing: r > t Prediction: r < t Viterbi: MPE

Slide 17

DBN induction calculations DBN deduction calculations expand HMM and KFM's surmising calculations, and call BN derivation calculations as subroutines DBN induction is NP-hard Exact Inference calculations: Forwards-in reverse smoothing calculation (on any discrete-state DBN) The wilderness algorithm(sweep a Markov cover, the outskirts set F, over the DBN, first advances and after that regressive) The interface calculation (utilize just the arrangement of hubs with active circular segments to whenever cut to d-isolate the past from the future) Kalman separating and smoothing Approximate calculations: The Boyen-Koller(BK) calculation (rough the joint circulation over the interface as a result of marginals) Factored frontier(FF) calculation Loopy spread algorithm(LBP) Kalman sifting and smoother Stochastic examining calculation: significance inspecting or MCMC(offline induction) Particle filtering(PF) (on the web)

Slide 18

DBN: Learning The procedures for learning DBN are for the most part clear augmentations of the systems for learning BNs; Parameter lear