“Probability theory is nothing but common sense reduced to calculation” (Laplace).
This module was adapted from E.T. Jaynes’ manuscript entitled: “Probability Theory with Applications to Science and Engineering – A Series of Informal Lectures”, 1974. The entire manuscript is available at http://bayes.wustl.edu/etj/science.pdf.html.
A second and significantly expanded edition of this manuscript is available on Amazon. The first 3 chapters of the second edition are available here http://bayes.wustl.edu/etj/prob/book.pdf.
Denote propositions by A, B, etc., their denials by
AcAc size 12{A rSub { size 8{c} } } {},
BcBc size 12{B rSub { size 8{c} } } {} etc. Define the logical product and logical sum by
AB≡AB≡ size 12{ ital "AB" equiv } {} “Both A and B are true”
A+B≡A+B≡ size 12{A+B equiv } {} “At least one of the propositions, A, B are true”
Deductive reasoning then consists of applying relations such as
A+A=AA+A=A size 12{A+A=A} {};
A (B+C)=(AB)+(AC)A (B+C)=(AB)+(AC) size 12{ ital "A " \( B+C \) = \( ital "AB" \) + \( ital "AC" \) } {};
if
D = AcBc D = AcBc size 12{ ital "D "= ital " A" rSub { size 8{c} } B rSub { size 8{ ital "c "} } } {}then
Dc= A+BDc= A+B size 12{D rSub { size 8{c} } = ital " A"+B} {}.
Inductive logic is the extension of deductive logic, describing the reasoning of an idealized “robot”, who represents degrees of plausibility of a logical proposition by real numbers:
p(A∣B)p(A∣B) size 12{p \( A \lline B \) } {}= probability of A, given B.
We use the original term “robot” advocated by Jaynes, it is intended to mean the use of inductive logic that follows a set of consistent rules that can be agreed upon. In this formulation of probability theory, conditional probabilities are fundamental. The elementary requirements of common sense and consistency determine these basic rules of reasoning (see Jaynes for the derivation).
In these rules, one can think of the proposition
CC size 12{C} {} being the prior information that is available to assign probabilities to logical propositions, but these rules are true without this interpretation.
Rule 1:
p(AB∣C)= p(A∣BC)p(B∣C)=p(B∣AC)p(A∣C)p(AB∣C)= p(A∣BC)p(B∣C)=p(B∣AC)p(A∣C) size 12{p \( ital "AB" \lline C \) = ital " p" \( A \lline ital "BC" \) p \( B \lline C \) =p \( B \lline ital "AC" \) p \( A \lline C \) } {}
Rule 2:
p(A∣B)+p(Ac∣B)=1p(A∣B)+p(Ac∣B)=1 size 12{p \( A \lline B \) +p \( A rSub { size 8{c} } \lline B \) = 1} {}
Rule 3:
p(A+B∣C)=p(A∣C)+p(B∣C)−p(AB∣C)p(A+B∣C)=p(A∣C)+p(B∣C)−p(AB∣C) size 12{p \( A+B \lline C \) =p \( A \lline C \) +p \( B \lline C \) - p \( ital "AB" \lline C \) } {}
Rule 4: If
{A1,…AN}{A1,…AN} size 12{ lbrace A rSub { size 8{1} } , dotslow A rSub { size 8{N} } rbrace } {}are mutually exclusive and exhaustive, and information
BB size 12{B} {}is indifferent to tem; i.e. if
BB size 12{B} {} gives no preference to one over any other then:
p(Ai∣B)=1/n,i=1…np(Ai∣B)=1/n,i=1…n size 12{p \( A rSub { size 8{i} } \lline B \) =1/n,i=1 dotslow n} {} (principle of insufficient reason)
From rule 1 we obtain Bayes’ theorem:
p
(
A
∣
BC
)
=
p
(
A
∣
C
)
p
(
B
∣
AC
)
p
(
B
∣
C
)
p
(
A
∣
BC
)
=
p
(
A
∣
C
)
p
(
B
∣
AC
)
p
(
B
∣
C
)
size 12{p \( A \lline ital "BC" \) =p \( A \lline C \) { {p \( B \lline ital "AC" \) } over {p \( B \lline C \) } } } {}
From Rule 3, if
{A1,…AN}{A1,…AN} size 12{ lbrace A rSub { size 8{1} } , dotslow A rSub { size 8{N} } rbrace } {}are mutually exclusive,
p
(
A
1
+
…
A
N
∣
B
)
=
∑
i
=
1
n
p
(
A
i
∣
B
)
p
(
A
1
+
…
A
N
∣
B
)
=
∑
i
=
1
n
p
(
A
i
∣
B
)
size 12{p \( A rSub { size 8{1} } + dotslow A rSub { size 8{N} } \lline B \) = Sum cSub { size 8{i=1} } cSup { size 8{n} } {p \( A rSub { size 8{i} } \lline B \) } } {}
If in addition, the
AiAi size 12{A rSub { size 8{i} } } {}are exhaustive, we obtain the chain rule:
p
(
B
∣
C
)
=
∑
i
=
1
n
p
(
BA
i
∣
C
)
=
∑
i
=
1
n
p
(
B
∣
A
i
C
)
p
(
A
i
∣
C
)
p
(
B
∣
C
)
=
∑
i
=
1
n
p
(
BA
i
∣
C
)
=
∑
i
=
1
n
p
(
B
∣
A
i
C
)
p
(
A
i
∣
C
)
size 12{p \( B \lline C \) = Sum cSub { size 8{i=1} } cSup { size 8{n} } {p \( ital "BA" rSub { size 8{i} } \lline C \) } = Sum cSub { size 8{i=1} } cSup { size 8{n} } {p \( B \lline A rSub { size 8{i} } C \) } p \( A rSub { size 8{i} } \lline C \) } {}
The initial information available to the robot at the beginning of any problem is denoted by
XX size 12{X} {}.
p(A∣X)p(A∣X) size 12{p \( A \lline X \) } {}is then the prior probability of
AA size 12{A} {}. Applying Bayes’ theorem to take account of new evidence
EE size 12{E} {}yields the posterior probability
p(A∣EX)p(A∣EX) size 12{p \( A \lline ital "EX" \) } {}. In a posterior probability we sometimes leave off the
XX size 12{X} {} for brevity:
p(A∣E)≡p(A∣EX).p(A∣E)≡p(A∣EX). size 12{p \( A \lline E \) equiv p \( A \lline ital "EX" \) "." } {}
Prior probabilities are determined by Rule 4 when applicable; or more generally by the principle of maximum entropy.
Enumerate the possible decisions
D1,…DkD1,…Dk size 12{D rSub { size 8{1} } , dotslow D rSub { size 8{k} } } {}and introduce the loss function
L(Di,θi)L(Di,θi) size 12{L \( D rSub { size 8{i} } ,θ rSub { size 8{i} } \) } {}representing the “loss” incurred by making decision
DiDi size 12{D rSub { size 8{i} } } {}if
θjθj size 12{θ rSub { size 8{j} } } {}is the true state of nature. After accumulating new evidence E, make that decision
DiDi size 12{D rSub { size 8{i} } } {}which minimizes the expected loss over the posterior distribution of
θjθj size 12{θ rSub { size 8{j} } } {} :
Choose the decision
DiDi size 12{D rSub { size 8{i} } } {}which minimizes
〈L〉i=∑jL(Di,θj)p(θj∣EX)〈L〉i=∑jL(Di,θj)p(θj∣EX) size 12{ langle L rangle rSub { size 8{i} } = Sum cSub { size 8{j} } {L \( D rSub { size 8{i} } ,θ rSub { size 8{j} } \) p \( θ rSub { size 8{j} } \lline ital "EX" \) } } {}
choose
D
i
such that is minimized
choose
D
i
such that is minimized
size 12{"choose "D rSub { size 8{i} } " such that is minimized"} {}