Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

M value: analysis of red probes, identification of adjustment variables

Percent variance explained of the "red" dataset:
Image Added
Potential adjustment variables: batch (sixth field in the patient ID barcode); center (second field in the patient ID barcode); day, month, year of shipment; concentration, amount, plate row, plate column. Potential biology: tumor stage, tumor grade, age.

Correlation of the red probes, first four eigengenes with the batch effect:

Code Block
kruskal.test(mredsvd$v[,2],as.factor(batch))

        Kruskal-Wallis rank sum test

data:  mredsvd$v[, 2] and as.factor(batch)
Kruskal-Wallis chi-squared = 10.8903, df = 12, p-value = 0.5383

kruskal.test(mredsvd$v[,3],as.factor(batch))

        Kruskal-Wallis rank sum test

data:  mredsvd$v[, 3] and as.factor(batch)
Kruskal-Wallis chi-squared = 21.7447, df = 12, p-value = 0.04048

kruskal.test(mredsvd$v[,4],as.factor(batch))

        Kruskal-Wallis rank sum test

data:  mredsvd$v[, 4] and as.factor(batch)
Kruskal-Wallis chi-squared = 35.8388, df = 12, p-value = 0.0003439

After removing the batch effect p-value for the association of the first principal component with the batch is 0.7748. adjustment and bio variables:

(Kruskal-Wallist test for categorical variables and Spearman correlation for age)

PCs

Batch

Center

Day

Month

Year

Amount

Concentr.

Row

Column

Stage

Grade

Age

1

2.2e-16

2.2e-16

2.2e-16

2.2e-16

2.2e-16

2.2e-16

0.0004486

5.882547e-02

6.881028e-02

0.32

0.27

0.1071

2

0.5383

0.3486

0.5577

0.9876

0.2710

0.04873

0.6482

0.2862026

0.4786892

0.31

0.10

0.006634

3

0.04048

0.05258

0.03756

0.01480

0.1233

0.1786

0.5335

0.55585676

0.25289498

0.50

0.35

0.5131

4

0.0003439

0.01709

0.0008948

0.0001387

0.5725

0.7225

0.5267

0.0516508987

0.1404578746

0.43

0.43

0.02168

From my previous work with Brig we identified that day, month and year of shipment and center are highly correlated with batch. Therefore start by removing the batch effect. Percent variance explained after removing the batch effect:

Code Block
collapsetrue
> X<-model.matrix(~factor(batch))
> Xbc<-solve(t(X) %*% X) %*% t(X) %*% t(red)
> redB<- red-t(X %*% Xbc)

Image Added
Looks like the first eigengene now explains ~20% of the variance. Correlation with the adjustment and bio variables after removing the batch effect:

PCs

Batch

Center

Day

Month

Year

Amount

Concentr.

Row

Column

Stage

Grade

Age

1

0.7748

9.445e-05

6.8e-01

8.2e-01

8.0e-01

5.4e-01

1.1e-01

1.3e-07

9.4e-05

0.541

0.093

0.6379

2

1

0.4722

1.00

1.00

0.98

0.78

0.76

0.26

0.45

0.18

0.11

0.01303

3

1

0.03917

1.00

1.00

0.94

0.98

0.45

0.59

0.15

0.60

0.24

0.06834

4

1

0.3475

1.000

1.000

0.998

0.969

0.428

0.055

0.104

0.56

0.73

0.1463

Removing the batch effect didn't completely remove the center and the row/column effect. Day, month and year of shipment have been taken care of.