The configuration file looks like this:
Here's what the options mean:
- dbname is the name of the relational database for which you want to build a Bayes net model.
- dbusername is the name of a user who can connect to the database. This user needs to have sufficient privileges to create new schemas, new tables, delete schemas etc.
- dbpassword is the password for the user to connect to the database.
- dbaddress is the name of the MySQL host. This should begin with mysql://.
- AutomaticSetup. The recommended default option is 1. It should be 1 if you want the program to automatically extract schema information from the systems catalog. The program uses this information to define a default set of nodes for the Bayes net model. Also, the program uses the setup information to create SQL queries for computing sufficient statistics and implementing other
statistical operations.
The setup information is converted to a set of database tables in a schema named dbname_setup. You can edit the setup database directly if you don't like the default choices. If you make changes, you should run the system again with AutomaticSetup = 0. In that mode, the system treats the setup database as read-only. - LinkCorrelations. The recommended default option is 1. The published version of the Learn-and-Join algorithm analyzes dependencies among attributes given link structure , but does not model dependencies among links. The new version learns a joint dependency model for both links and attributes. Modelling link dependencies is expensive in terms of both memory requirements and runtime. If you have difficulty with system resources, you may want to set this option to 0. This can also be a good way the system before you learn a full model.
- ComputeKLD. The recommended default option is 0. This does not affect the learning, only the evaluation of the learned model. It checks the probabilistic query answers provided by the Bayes net against the frequencies of events in the database. This involves running a lot of complex SQL queries that take a lot of time.
- Continuous. If this option is 0, the system assumes that all columns in the database are for discrete data only. If this option is set to 1, it assumes that all columns are for continuous data only. Since link indicators are always discrete (a link exists or it does not), setting Continuous = 1 also sets LinkCorrelations = 0, that is, no dependencies involving links are learned. Our program currently does not handle databases with mixed data types, both discrete and continuous. This is mainly a limitation that we inherit from the Tetrad Bayes net learner, not the Learn-and-Join Algorithm.