# AM41PB: A total of 10% will be given to the presentation and quality of the exposition. The submission should be readable, well organized, and professionally looking: Probalistic modelling course work, UOE, UK

 University University of Edinburgh(UoE) Subject AM41PB: Probabilistic Modelling

A total of 10% will be given to the presentation and quality of the exposition. The submission should be readable, well organized, and professionally looking.

1. Bayesian approach – The Pareto distribution is a power-law-like probability distribution that describes the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. It takes the form where it provides the probability of x above a threshold XM ≥ 0.
On BlackBoard, under the section assignments, please find a dataset income distribution by percentile of the UK population for the tax years 2010-11 to 2018-9. Use the last digit of your student number to choose the tax year you will be working on (0 → 2010 − 11, 1 → 2011 − 2, . . . , 8, 9 → 2018 − 9).
(a) Show that the Pareto distribution is indeed a probability distribution.

(b) Plot the data as the percentage of the population (y-axis) with income above certain values (x-axis); to simplify the presentation present the income in thousands of pounds.

(c) Assuming the data is i.i.d. and represented by the Pareto distribution, find the maximum likelihood expression for the parameter λML. Obtain the numerical value of λML given data D and XM = £15, 000. Specify the data you are working with and the number of data points used.

(d) For data set D = {x1, . . . , xN }, show that the Gamma probability distribution function is the conjugate prior for the Pareto distribution. Derive the corresponding posterior p(λ|D).

(e) Assuming the data is represented by the Pareto distribution, find the Maximum A Posteriori expression of the parameter λMAP. Use the prior values α = 1 and β = 1 to obtain the numerical value of λMAP given data D.
(f) Use the values obtained for the parameters λML and λMAP to compare the estimated percentage of the population with income above £20, 000.

Do You Need Assignment of This Question

2. Decision theory- Large Diamonds (D) are exponentially rare and their weight follows the exponential distribution where g is the weight of a diamond in grams and λD is a constant. In the same mine, there are also White Sapphire (WS) crystals that look like diamonds but are very regular in weight, following a Gaussian distribution of mean µS and variance σ2S\. The average size of diamonds λD is much smaller than that of White Sapphire λD  µS. Additionally, diamonds are much rarer, so that p(WS)  p(D).
The mine owner wants to automate the sorting of Diamonds vs White Sapphire by weight, setting a threshold weight gT .

(a) Write an expression for the misclassification error of Diamonds as White Sapphire p(D|WS) and of White Sapphire as Diamonds p(WS|D) as functions of the threshold weight gT .

(b) Minimise the total misclassification error with respect to the threshold weight gT . What is the expression obtained for gT (you can use a simplified notation)?

(c) Calculate the value of gT for the parameters λD = 1, µS = 5, σ2
S = 1 and prior probabilities p(WS) = 0.8 and p(D) = 0.2. Sketch the two distributions, point to the threshold value gT and to the area that contributes to the misclassification error of Diamonds as White Sapphire.

(d) The mine is losing money and the owner realizes that he has not taken into account that large diamonds are exponentially more expensive and the misclassification cost p(WS|D) with respect to p(D|WS) (which for simplicity we will attribute cost 1 to) is Assume λC < λD. Find the threshold weight gT expression in this case (not numerical value).

3. Graphical models – James is concerned that he has contracted a new disease (J = 1). He shows suspicious symptoms: he has a sore throat (S = 1) and an elevated fever (F = 1). He has been in touch only with Ruth (R) over the last week and thinks she has infected him.

The following probabilities are given:

• the prior probability of Ruth having the disease is P(R = 1) = 1/4;

• Ruth spent a lot of time with James recently and is likely to have infected him if she is infectious P(J = 1|R = 1) = 3/4; clearly, she has not infected him if she does not have the disease;

• a person (say James) who has contracted the disease, always experiences a sore throat P(S = 1|J = 1) = 1 but has elevated fever only in half of the cases P(F = 1|J = 1) = 1/2; not having contracted the disease, James can still have elevated fever P(F = 1|J = 0) = 1/4 and a sore throat P(S = 1|J = 0) = 1/4

(a) Plot the directed acyclic graphical model that represents the problem.

(b) Write an expression for the joint probability of all variables in the problem as well as the probabilities given in the question.

(c) What is the probability that Ruth has the disease given that James has a sore throat and elevated fever P(R = 1|S = 1, F = 1)? [8 marks]

(d) After a while, James drinks some water and realizes that his sore throat was a result of not drinking enough so S = 0. What is the probability that Ruth has the disease now?

(e) Later in the day James is again unclear whether he has a sore throat or not. A lateral flow test (T) shows that Ruth is positive for the disease (T = 1). However, these tests are inaccurate and the probability of a person having the disease testing positive is 3/4, but also healthy individuals may test positive with a probability of 1/4. Given that James has an elevated fever (F = 1), what is the probability that he is ill? 