'Create Bayesian Network and learn parameters with Python3.x [closed]
I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform the inference.
The network structure I want to define myself as follows:
It is taken from this paper.
All the variables are discrete (and can take only 2 possible states) except "Size" and "GraspPose", which are continuous and should be modeled as Mixture of Gaussians.
Authors use Expectation-Maximization algorithm to learn the parameters for conditional probability tables and Junction-Tree algorithm to compute the exact inference.
As I understand all is realised in MatLab with Bayes Net Toolbox by Murphy.
I tried to search something similar in python and here are my results:
Python Bayesian Network Toolbox http://sourceforge.net/projects/pbnt.berlios/ (http://pbnt.berlios.de/). Web-site doesn't work, project doesn't seem to be supported.
BayesPy https://github.com/bayespy/bayespy I think this is what I actually need, but I fail to find some examples similar to my case, to understand how to approach construction of the network structure.
PyMC seems to be a powerful module, but I have problems with importing it on Windows 64, python 3.3. I get error when I install development version
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
UPDATE:
- libpgm (http://pythonhosted.org/libpgm/). Exactly what I need, unfortunately not supported by python 3.x
- Very interesting actively developing library: PGMPY. Unfortunately continuous variables and learning from data is not supported yet. https://github.com/pgmpy/pgmpy/
Any advices and concrete examples will be highly appreciated.
Solution 1:[1]
It looks like pomegranate was recently updated to include Bayesian Networks. I haven't tried it myself, but the interface looks nice and sklearn-ish.
Solution 2:[2]
Try the bnlearn library, it contains many functions to learn parameters from data and perform the inference.
pip install bnlearn
Your use-case would be like this:
# Import the library
import bnlearn
# Define the network structure
edges = [('task', 'size'),
('lat var', 'size'),
('task', 'fill level'),
('task', 'object shape'),
('task', 'side graspable'),
('size', 'GrasPose'),
('task', 'GrasPose'),
('fill level', 'GrasPose'),
('object shape', 'GrasPose'),
('side graspable', 'GrasPose'),
('GrasPose', 'latvar'),
]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# DAG is stored in adjacency matrix
print(DAG['adjmat'])
# target task size lat var ... side graspable GrasPose latvar
# source ...
# task False True False ... True True False
# size False False False ... False True False
# lat var False True False ... False False False
# fill level False False False ... False True False
# object shape False False False ... False True False
# side graspable False False False ... False True False
# GrasPose False False False ... False False True
# latvar False False False ... False False False
#
# [8 rows x 8 columns]
# No CPDs are in the DAG. Lets see what happens if we print it.
bnlearn.print_CPD(DAG)
# >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot DAG. Note that it can be differently orientated if you re-make the plot.
bnlearn.plot(DAG)
Now we need the data to learn its parameters. Suppose these are stored in your df. The variable names in the data-file must be present in the DAG.
# Read data
df = pd.read_csv('path_to_your_data.csv')
# Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes.
DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')
# CPDs are present in the DAG at this point.
bnlearn.print_CPD(DAG)
# Start making inferences now. As an example:
q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1})
Below is a working example with a demo dataset (sprinkler). You can play around with this.
# Import example dataset
df = bnlearn.import_example('sprinkler')
print(df)
# Cloudy Sprinkler Rain Wet_Grass
# 0 0 0 0 0
# 1 1 0 1 1
# 2 0 1 0 1
# 3 1 1 1 1
# 4 1 1 1 1
# .. ... ... ... ...
# 995 1 0 1 1
# 996 1 0 1 1
# 997 1 0 1 1
# 998 0 0 0 0
# 999 0 1 1 1
# [1000 rows x 4 columns]
# Define the network structure
edges = [('Cloudy', 'Sprinkler'),
('Cloudy', 'Rain'),
('Sprinkler', 'Wet_Grass'),
('Rain', 'Wet_Grass')]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot the DAG
bnlearn.plot(DAG)
# Parameter learning on the user-defined DAG and input data
DAG = bnlearn.parameter_learning.fit(DAG, df)
# Print the learned CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
# CPD of Cloudy:
# +-----------+-------+
# | Cloudy(0) | 0.494 |
# +-----------+-------+
# | Cloudy(1) | 0.506 |
# +-----------+-------+
# CPD of Sprinkler:
# +--------------+--------------------+--------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +--------------+--------------------+--------------------+
# | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 |
# +--------------+--------------------+--------------------+
# | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 |
# +--------------+--------------------+--------------------+
# CPD of Rain:
# +---------+--------------------+---------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +---------+--------------------+---------------------+
# | Rain(0) | 0.6518218623481782 | 0.33695652173913043 |
# +---------+--------------------+---------------------+
# | Rain(1) | 0.3481781376518219 | 0.6630434782608695 |
# +---------+--------------------+---------------------+
# CPD of Wet_Grass:
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Rain | Rain(0) | Rain(0) | Rain(1) | Rain(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Sprinkler | Sprinkler(0) | Sprinkler(1) | Sprinkler(0) | Sprinkler(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663 | 0.7441176470588236 | 0.6208955223880597 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# Make inference
q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})
# +--------------+------------------+
# | Wet_Grass | phi(Wet_Grass) |
# +==============+==================+
# | Wet_Grass(0) | 0.2559 |
# +--------------+------------------+
# | Wet_Grass(1) | 0.7441 |
# +--------------+------------------+
print(q1.values)
# array([0.25588235, 0.74411765])
More examples can be found on documentation the pages of bnlearn or read the blog.
Solution 3:[3]
I was looking for a similar library, and I found that the pomegranate is a good one. Thanks James Atwood
Here is an example how to use it.
from pomegranate import *
import numpy as np
mydb=np.array([[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,3,8],[2,3,8],[1,2,4]])
bnet = BayesianNetwork.from_samples(mydb)
print(bnet.node_count())
print(bnet.probability([[1,2,3]]))
print (bnet.probability([[1,2,8]]))
Solution 4:[4]
For pymc's g++ problem, I highly recommend to get g++ installation done, it would hugely boost the sampling process, otherwise you will have to live with this warning and sit there for 1 hour for a 2000 sampling process.
The way to get the warning fixed is: 1. get g++ installed, download cywing and get g++ install, you can google that. To check this, just go to "cmd" and type "g++", if it says "require input file", great, you got g++ installed. 2. install python package: mingw, libpython 3. install python package: theano
this should get this problem fixed.
I am currently working on the same problem with you, good luck!
Solution 5:[5]
Late to the party, as always, but I've wrapped up the BayesServer Java API using JPype; it might not have all the functionality that you need but you would create the above network using something like:
from bayesianpy.network import Builder as builder
import bayesianpy.network
nt = bayesianpy.network.create_network()
# where df is your dataframe
task = builder.create_discrete_variable(nt, df, 'task')
size = builder.create_continuous_variable(nt, 'size')
grasp_pose = builder.create_continuous_variable(nt, 'GraspPose')
builder.create_link(nt, size, grasp_pose)
builder.create_link(nt, task, grasp_pose)
for v in ['fill level', 'object shape', 'side graspable']:
va = builder.create_discrete_variable(nt, df, v)
builder.create_link(nt, va, grasp_pose)
builder.create_link(nt, task, va)
# write df to data store
with bayesianpy.data.DataSet(df, bayesianpy.utils.get_path_to_parent_dir(__file__), logger) as dataset:
model = bayesianpy.model.NetworkModel(nt, logger)
model.train(dataset)
# to query model multi-threaded
results = model.batch_query(dataset, [bayesianpy.model.QueryModelStatistics()], append_to_df=False)
I'm not affiliated with Bayes Server - and the Python wrapper is not 'official' (you can use the Java API via Python directly). My wrapper makes some assumptions and places limitations on functions that I don't use very much. The repo is here: github.com/morganics/bayesianpy
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | James Atwood |
Solution 2 | |
Solution 3 | |
Solution 4 | Teng Fu |
Solution 5 | morganics |