KLD_generator Database name of original dataset and BN database are needed. 1. smoothed_CP input: the biggest rchain output: smoothed conditional probabiliy for each node in given rchain workflow: 1) for the node that has parents: loop all such nodes new_table_smoothed: generate smoothed CP for one node 1.1 pair_table: generate full pairs table (we mean pairs of child value + parent state) join all node columns (e.g. ChildeValue, b, grade, sat...) in original CP table to get all possible pairs 1.2 insert rows into smoothed CP that exist in original CP table 1.3 insert rows into smoothed CP that not exist in original CP table, MULT=0, from full pair table pair table is temprory table, can be dropped here 1.4 MULT=MULT+1 1.5 update_ps: calculate parent_sum and CP create a temp table to get parent_sum for smoothed CP update parentsum and CP column in smoothed CP table 2) for the node that doesn't have parents loop all such nodes copy values from original CP table and ++MULT, recalculate CP column 2. create_join_CP input: the biggest rchain output: KLD for the given rchain Create one KLD table for given rchain, based on table `rchain_CT`. The structure of KLD table is similar to `rchain_CT`, and add columns for conditional probability for each node in rchain, joined probability (both smoothed and unsmoothed), and KLD. workflow: 1) create KLD table 2) insert value for each node into KLD table (select * from `rchain_CT`) 3) insert_CP_Values: insert CP value to each node for the node without parents loop all such nodes set the CP value of that node equal to its smoothed CP value when the node value match for the node has parents loop all such nodes set the CP value of that node equal to its smoothed CP value when the node value match and the values of all its parents mathch 4) cal_KLD: calculate JP, JP_DB and KLD JP(smoothed) = the product of conditional probability of all nodes in given rchain JP_DB(unsmoothed) = mult / sum(mult) KLD = JP_DB * LOG(JP_DB / JP) 3. generate_CLL input: the biggest rchain output: CLL table of each node in given rchain workflow: loop all nodes in given rchain: generate_CLL_node: generate CLL table for one node 1) markov_blank: get markov blanket of one node in given rchain, and return the result in an array list get children, parents and spouses of the node and store in a set (to reduce all duplicate nodes) turn the set into an array list and return the list 2) create CLL table of the node, containing these columns: the value of the target node and all nodes in its markov blanket, JP_DB, JP_DB_blanket, CLL_DB, JP, JP_blanket, CLL_JP compute conditional probability by dividing joint probability of child + markov blanket over joint prob of markov blanket. JP_DB = sum(JP_DB) in KLD table of given rchain, group by the target node and all nodes in its markov blanket JP = sum(JP) in KLD table of given rchain, group by the target node and all nodes in its markov blanket JP_DB_blanket = sum(JP_DB) in CLL table of given node and group by all nodes in its markov blanket JP_blanket = sum(JP) in CLL table of given node and group by all nodes in its markov blanket CLL_DB = JP_DB / JP_DB_blanket CLL = JP / JP_blanket