ModuleECUE«AppliedAI»
AI&Biomedical:Bigdatainbioimagery
XavierDescombesMorphemeteamINRIA/I3S/iBV
HighSpatialResolutionMultiscale
• Microscopyimages:– Spatialresolutioninx/y:lowerthan1µm– 2Dor3Ddatasets:uptoseveralhundredsofslices
Example:micebrainimageonlight-sheetmicroscopyX=0.75µm,Y=0.75µm,Z=1.99µm
6000x6000x1000voxelsVoxelscodedon16bitStudyon2channels+timecourse20-300Gbabout40brains
HistopathologydataPyramidalstructureofdata(Whole_SlideImagingformat)
Histopathologicaldata
(822x1.365) (13.152x21.840) (52.608x87.360)
Firstgoal:focusonsuspiciousareasMultiscaleapproach:
Firstanalysethelowresolutionimagetoconsideronlysmallareasonthehighresolutionimage(toreduceforexamplethesizeofaCNNfirstlayer)
Histopathologydataanalysis
• Goal:classifyandgradethecancer• FocussonROI(tumorzones)• Medicaldecisionbasedonlocalpatterns• Histopathologistanalysis:
– ScreenthelowresolutionimagetodetectROIs– ZoomontheseROIs(andcomeback)->multiscaleanalysis->qualitativeanalysis
Afirstmachinelearningtask
• ClassifyROI(tumorareas)
Tumorareas(ROI)InitialImage
Challenge
§ Variabilitybetweenandwithinimages§ Noninformativeareas(fat,blood…)§ Hugedatasets(12Go≅100000pixelsperaxis)
Patchesclassification
Tumor
Pre-Processing:colordeconvolution
RGBimage Hchannel Echannel
Reducedataset• Removepatcheswithfewcells
ImageRGB Hchannel Detectednuclie
32nuclei
120nuclei
Reducedataset
Featuresextraction:localbinarypatterns
Imagegrayvalues(P=8) RLBP
Rotationinvariance
Classification:k-means
Hchannel EchannelRGBimage
Resultafternucleidetection
14 440 8700 … … … … … … 745RLBP-H
945 560 163 … … … … … … 12RLBP-EP=16,R=3(65.536x2patterns)
FindtheMostFrequentpatternsfromthetrainingimages
ClassificationK-NN
• Classesnumber:02(Tumor/Not-tumor)• Learning:10slidesè850patches• Test:05slides
Resultonlearningset
Inputimage GroundTruth Result
Resultontestset
Inputimage GroundTruth Result
HighThroughputdata
PILOTSCREEN
Kinases and phosphatases (563)
RNA binding proteins (406) ~1000 out of 13000 genes
GFP-Imp+cells
highthroughputconfocalmicroscope
RNAi
GFP-Imp/DAPI
3millionsofimages!!!
HighThroughputdataFirstconsideraPILOTscreen(subsampleofwellchosengenes):
Kinasesandphosphatases(563)
RNAbindingproteins(406)~1000outof13000genes
ApipelineforanalysingthescreenSelection of healthy cells
Extraction of cell cytoplasm
Detection of Imp granules
Hit identification
segmentation machine learning
classification
healthy cells
RAW
IMA
GE
DAT
A B
ASE
segmentation
DAPI GFP-Imp
SPADE algorithm
GFP-Imp+ cells GFP-Imp particles
statistics clustering
Altered granule size, number, distribution ?
GFP-Imp
Disciplinesinvolved:biology,machinelearning,imageprocessing,database
Machinelearningtask
• Classifythecellsw.r.t.thegranulepopulation• Features:number,size,spatialrepartition• Challenge:unknownnumberofclasses• Rejectionclass• Unbalancedclasses• Hugenumberofsamples:checkingdifficult
Timesequences
Timesequences
Multipleparticlestracking
• Mainapproachesintwosteps:1) Objects(particles)detection2) Objects(particles)linking
• Particularcase:(forlowspeed)» Trajectoriesdetectionin(Space+Time)domain
Thechallenges• Detection(seecorrespondingcourse)
• Appearance/Disappearanceofparticles
• Crossing• Occlusion
• Noise
• LocationVSshapedescriptor
DetectandMatcht t+1
Match:nearestneighbor
xi
yj
M(xi)=argmind(xi,yj)yj
Unicityconstraintversus
Matchingmatrix: M(i, j) = 0 or 1X
i
M(i, j) = 0 or 1
Maximumvelocity
versus
d(xi, yj) > Vmax =) M(i, j) = 0
Globaloptimization
argminM
X
i2I,j2J
d(xi, yj)M(i, j)
M(i, j) = 0 or 1
8iX
j
M(i, j) = 0 or 1
d(xi, yj) = f(||yj � xi||)Vmax V
f(V )
Movementmodeling
• Brownianmotion:randommovement(bigparticleinafluid)
https://fr.wikipedia.org/wiki/Mouvement_brownien#Processus_d%E2%80%99Ornstein-Uhlenbeck
P (xt+1|xt) =1
2⇡p�2
exp
� (xt+1 � xt)2
2�2
�
Deterministicspeedmodelxt+1 = xt + Vt(xt)dt+ d⌘t+1
Vt(xt)
⌘t+1
Deterministicmodel
Fluctuation
Advantageofamodel
versus
Speed:learntfromamodelorestimatedfrompaststeps
Gapfilling
t t+1 T+2
Graphmodel:MinimalPath
DA
LinkingCost
Tracklets
v
Twosteps:Local(trackletsdetection),trackletsmergingPros:considertrajectoryand/orspeedmodels
Takehomemessage
• Donotconsiderhigherresolutionthanneeded(forspace,timeandintensity)
• Consideramultiscaleapproach• Adapttheprocessingtothesizeofdata(compromisebetweenaccuracyandcomputaitontime)
• Considerparrallelprograming