Eric W. Bridgeford and Jaewon Chung
For questions, reach out to ericwb95@gmail.com or j1c@jhu.edu
Follow the slides: ericwb.me/lectures/intro_graph_ML.html
Person | Biological Sex | Height | Age |
---|---|---|---|
Person 1 | Male | 5′9" | 28 |
Person 2 | Female | 5′5" | 24 |
... | ... | ... | ... |
Person | Biological Sex | Height | Age |
---|---|---|---|
Person 1 | Male | 5′9" | 28 |
Person 2 | Female | 5′5" | 24 |
... | ... | ... | ... |
xi∼Bern(p) i.i.d heads (value 1) and tails (value 0)
xi∼Bern(p) i.i.d heads (value 1) and tails (value 0)
L(x1,...,x15)=∏i=115pxi(1−p)xi
xi∼Bern(p) i.i.d heads (value 1) and tails (value 0)
L(x1,...,x15)=∏i=115pxi(1−p)xi
take derivative of ℓ=log(L) and set equal to zero ⇒p^=15∑i=115xi is the MLE
xi∼Bern(p) i.i.d heads (value 1) and tails (value 0)
L(x1,...,x15)=∏i=115pxi(1−p)xi
take derivative of ℓ=log(L) and set equal to zero ⇒p^=15∑i=115xi is the MLE
check that we found a maximum (second derivative, check extrema)
di=degree(i)=j=1∑naij
D=⎣⎡d10⋮00⋱⋱⋯⋯⋱⋱00⋮0dn⎦⎤
D=⎣⎡d10⋮00⋱⋱⋯⋯⋱⋱00⋮0dn⎦⎤=⎣⎡d1⋱dn⎦⎤
D−1=⎣⎡d11⋱dn1⎦⎤
D−1=⎣⎡d11⋱dn1⎦⎤
L=⎣⎡d11⋱dn1⎦⎤⎣⎡a11⋮an1⋯⋱⋯a1n⋮ann⎦⎤⎣⎡d11⋱dn1⎦⎤=⎣⎡d1a11⋮dnd1a11⋯⋱⋯d1dna1n⋮dnann⎦⎤
(Source: Muse)
(Source: Schiffler 2017)
(Source: Schiffler 2017/me)
Person | Biological Sex | Height | Age |
---|---|---|---|
Person 1 | Male | 5′9" | 28 |
Person 2 | Female | 5′5" | 24 |
... | ... | ... | ... |
[0110] or [0000]
⎣⎡011101110⎦⎤ or ⎣⎡010101010⎦⎤ or ⎣⎡001001110⎦⎤ or
⎣⎡011100100⎦⎤ or ⎣⎡001000100⎦⎤ or ⎣⎡000001010⎦⎤ or
⎣⎡010100000⎦⎤ or ⎣⎡000000000⎦⎤.
P=CB C⊤
C: one-hot encoding of the community assignment vector z
CB is a n×K matrix times a K×K matrix, so n×K
CB=⎣⎡c11cn1...⋮...c1KcnK⎦⎤⎣⎡b11⋮bK1...⋱⋯b1K⋮bKK⎦⎤
Matrix product AB is defined, if A has n rows and m columns, and B has m rows with w columns, to be:
AB=⎣⎡a11⋮an1⋯⋱⋯a1m⋮anm⎦⎤⎣⎡b11⋮bm1⋯⋱⋯b1w⋮bmw⎦⎤
P=CB C⊤
CB=⎣⎡c11cn1...⋮...c1KcnK⎦⎤⎣⎡b11⋮bK1...⋱⋯b1K⋮bKK⎦⎤
But: cil={1,0,zi=lzi=l
P=CB C⊤
CB=⎣⎡c11cn1...⋮...c1KcnK⎦⎤⎣⎡b11⋮bK1...⋱⋯b1K⋮bKK⎦⎤
But: cil={1,0,zi=lzi=l
P=CB C⊤
CB=⎣⎡c11cn1...⋮...c1KcnK⎦⎤⎣⎡b11⋮bK1...⋱⋯b1K⋮bKK⎦⎤
P=CB C⊤
CB=⎣⎡bz11bzn1⋯⋮⋯bz1KbznK⎦⎤
P=CBC⊤
Right-multiply CB by C⊤:
CBC⊤=⎣⎡bz11bzn1⋯⋮⋯bz1KbznK⎦⎤⎣⎡c11⋮c1K⋯cn1⋮cnK⎦⎤
(CBC⊤)ij=∑l=1Kbzilcjl
Since cjl=1 when zj=l and 0 otherwise:
(CBC⊤)ij=bzizj
P=CBC⊤=⎣⎡bz1z1⋮bznz1⋯⋱⋯bz1zn⋮bznzn⎦⎤
P=CBC⊤=⎣⎡bz1z1⋮bznz1⋯⋱⋯bz1zn⋮bznzn⎦⎤
X=⎣⎡←←x1⊤⋮xn⊤→→⎦⎤
X=⎣⎡←←x1⊤⋮xn⊤→→⎦⎤
E[aij]=a∈{0,1}∑a⋅P(aij=a)=0⋅(1−pij)+1⋅pij=pij
E[di;zi=k]=j=i∑E[aij;zi=k]
E[di;zi=k]=j=i∑E[aij;zi=k]
E[di;zi=k]=l=1∑Kj:j=i,zj=l∑bkl=j:j=i,zj=k∑bkk+l=k∑j:zj=l∑bkl
E[di;zi=k]=l=1∑Kj:j=i,zj=l∑bkl=j:j=i,zj=k∑bkk+l=k∑j:zj=l∑bkl
E[di;zi=k]=l=1∑Kj:j=i,zj=l∑bkl=j:j=i,zj=k∑bkk+l=k∑j:zj=l∑bkl
E[di;zi=k]=bkkj:j=i,zj=k∑1+l=k∑bklj:zj=l∑1
E[di;zi=k]=bkkj:j=i,zj=k∑1+l=k∑bklj:zj=l∑1
E[di;zi=k]=(nk−1)bkk+l=k∑nlbkl
(Source: kbucha73)
(Source: kbucha73)
E[di;zi=k]=θi(l=k∑[nlbklj:zj=l∑θj]+(nk−1)bkkj=i:zj=k∑θj)
(Handout page 64)
E[di;zi=k]=θi(l=k∑[nlbklj:zj=l∑θj]+(nk−1)bkkj=i:zj=k∑θj)
(Handout page 64)
A matrix with real entries R is positive semi-definite (PSD) if for some other matrix Q with real entries, we can write:
R=QQ⊤
A matrix with real entries R is positive semi-definite (PSD) if for some other matrix Q with real entries, we can write:
R=QQ⊤
SBM: P=(CB)(CB)⊤=XX⊤ where X=CB
DCSBM: P=(ΘCB)(ΘCB)⊤=XX⊤ where X=ΘCB
SBM: P=(CB)(CB)⊤=XX⊤ where X=CB
DCSBM: P=(ΘCB)(ΘCB)⊤=XX⊤ where X=ΘCB
If B is a 2×2 real matrix, B is PSD if:
If B is a 2×2 real matrix, B is PSD if:
B=[b11b21b12b22] The determinant det(B)=b11b22−b12b21
B=[b11b21b12b22]
B=[b11b21b12b22]
B=[b11b21b12b22]
If A(1) and A(2) are IERn(P) networks with probability matrices P(1) and P(2), we call them stochastically equivalent if P(1)=P(2)
If A(1) and A(2) are IERn(P) networks with probability matrices P(1) and P(2), we call them stochastically equivalent if P(1)=P(2)
Recall that if B is PSD, for a random network A(1)∼SBMn(z,B) that:
SBM: P(1)=(CB)(CB)⊤=XX⊤ where X=CB
If A(1) and A(2) are IERn(P) networks with probability matrices P(1) and P(2), we call them stochastically equivalent if P(1)=P(2)
Recall that if B is PSD, for a random network A(1)∼SBMn(z,B) that:
SBM: P(1)=(CB)(CB)⊤=XX⊤ where X=CB A(2)∼RDPGn(X) where X=CB
If A(1) and A(2) are IERn(P) networks with probability matrices P(1) and P(2), we call them stochastically equivalent if P(1)=P(2)
Recall that if B is PSD, for a random network A(1)∼SBMn(z,B) that:
SBM: P(1)=(CB)(CB)⊤=XX⊤ where X=CB A(2)∼RDPGn(X) where X=CB
R=QΛQ⊤
P=QdΛdQd⊤=(QdΛd)(QdΛd)⊤=XX⊤
P=QdΛdQd⊤=(QdΛd)(QdΛd)⊤=XX⊤ with
X=QdΛd=P
P=QdΛdQd⊤=(QdΛd)(QdΛd)⊤=XX⊤ with
X=QdΛd=P and
Λd=⎣⎡λ1⋱λd⎦⎤
P=[0.50.50.50.5],A=[0110]
P=[0.50.50.50.5],A=[0110]
P=[0.50.50.50.5],A=[0110]
P=[0.50.50.50.5],A=[0110]
P=YY⊤=XW(XW)⊤=XWW⊤X⊤=XIdX⊤=XX⊤
L=XX⊤,X=UdΣd
L=XX⊤,X=UdΣd
Clusters | gmm o ase | gmm o lse |
---|---|---|
1 | right GM , left GM | right GM , right WM |
2 | right WM , left WM | left GM , left WM |
B(1)=⎣⎡b11(1)b21(1)b31(1)b12(1)b22(1)b32(1)b13(1)b23(1)b33(1)⎦⎤,B(2)=⎣⎡b11(2)b21(2)b31(2)b12(2)b22(2)b32(2)b13(2)b23(2)b33(2)⎦⎤
B(1)=⎣⎡b11(1)b21(1)b31(1)b12(1)b22(1)b32(1)b13(1)b23(1)b33(1)⎦⎤,B(2)=⎣⎡b11(2)b21(2)b31(2)b12(2)b22(2)b32(2)b13(2)b23(2)b33(2)⎦⎤
B(1)=⎣⎡b11(1)b21(1)b31(1)b12(1)b22(1)b32(1)b13(1)b23(1)b33(1)⎦⎤,B(2)=⎣⎡b11(2)b21(2)b31(2)b12(2)b22(2)b32(2)b13(2)b23(2)b33(2)⎦⎤
H0,kl:bkl(1)=bkl(2); HA,kl:bkl(1)=bkl(2)
B(1)=⎣⎡b11(1)b21(1)b31(1)b12(1)b22(1)b32(1)b13(1)b23(1)b33(1)⎦⎤,B(2)=⎣⎡b11(2)b21(2)b31(2)b12(2)b22(2)b32(2)b13(2)b23(2)b33(2)⎦⎤
H0,kl:bkl(1)=bkl(2); HA,kl:bkl(1)=bkl(2)
night time | day time | |
---|---|---|
Number of edges | a | b |
Number of non-edges | c | d |
night time | day time | |
---|---|---|
Number of edges | a | b |
Number of non-edges | c | d |
night time | day time | |
---|---|---|
Number of edges | a | b |
Number of non-edges | c | d |
Intuitively, tests whether proportion of edges is independent of time of day.
Now we can compute the p-value for each H0,kl hypothesis!
night time | day time | |
---|---|---|
Number of edges | a | b |
Number of non-edges | c | d |
Intuitively, tests whether proportion of edges is independent of time of day.
Now we can compute the p-value for each H0,kl hypothesis!
What about multiple comparisons?
Are the latent positions from two networks different?
H0:X=YW; HA:X=YW
Are the latent positions from two networks different?
H0:X=YW; HA:X=YW
Are the distributions of latent positions from two networks different?
H0:FX=FY; HA:FX=FYW
Are the latent positions from two networks different?
H0:X=YW; HA:X=YW
Are the distributions of latent positions from two networks different?
H0:FX=FY; HA:FX=FYW
Which latent positions are different from two networks?
H0:Xi=YiW; HA:Xi=YiW
xj(i)={1,0,j=ij=i
pi=P⊤x(i)=⎣⎡p11⋮pn1⋯⋱⋯p1n⋮pnn⎦⎤⊤⎣⎡x1(i)⋮xn(i)⎦⎤=⎣⎡p11⋮p1n⋯⋱⋯pn1⋮pnn⎦⎤⎣⎡x1(i)⋮xn(i)⎦⎤
Continuing, we get:
pi=P⊤x(i)=⎣⎡∑i=1npj1xj(i)⋮∑i=1npjnxj(i)⎦⎤=⎣⎡pi1⋮pin⎦⎤ because xi(i)=1 is the only entry that is non-zero (other sum terms drop out)
x∼Multinoulli(p) if:
x∼Multinoulli(p) if:
Let st+1∼Multinoulli(pst)
random_walk(P, s0):"""P: the nxn transition probability matrix.s0: the source node.""" Let s[0] = s0. For t from 1:T: Let x[i] = 1 where i = s[t - 1], and 0 otherwise. Let p = transpose(P)* x. Let s[t] be a Multinoulli sample with probability vector p. Return the random walk s.
generate_corpus(T, R):"""T: the walk length.R: the number of walks per node in the network.""" For each source node i in 1:n: For each walk r in 1:R: Use the procedure developed to generate a random walk of length T, starting at node i. Return the collection of random walks.
W⊤x(i)=⎣⎡↑w(1)↓⋯↑w(n)↓⎦⎤⎣⎡x1(i)⋮xn(i)⎦⎤=⎣⎡∑j=1nw1(j)xj(i)⋮∑j=1nwd(j)xj(i)⎦⎤
W⊤x(i)=⎣⎡∑j=1nw1(j)xj(i)⋮∑j=1nwd(j)xj(i)⎦⎤=⎣⎡w1(i)⋮wd(i)⎦⎤=w(i)
σ(y)j=[∑j′=1nexp(yj′)exp(yj)]
σ(y)j=[∑j′=1nexp(yj′)exp(yj)]
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |