Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | daniel-wheeler |
View: | 216 times |
Download: | 0 times |
• 1: Experiments n Evaluation• a) HIVA plot the cost, explore & exploit scores - DONE• - measure correlation • - observe greedy vs CELF selection• b) HIVA - different cost function (number of features present)
- DONE
• c) Temporal - 20 newsgroup
• 2: Writing - compile papers into thesis proposal (highest priority in parallel)
StatusExperiments n Evaluation• 20 Newsgroup plot the cost, explore & exploit scores
– Debug setup - DONE– measure correlation - TODO
• exact measures not calculated yet but visually looked uncorrelated– observe greedy vs CELF selection
• Celf & greedy selection is difficult to rationalize why one selected some claims vs others but validated that they had quite a few common ones but did have some different ones
• HIVA– Debug setup - DONE– measure correlation
• exact measures not calculated yet but visually looked uncorrelated– observe greedy vs CELF selection– different cost function (number of features present) - Running 4/3/13 (need to analyse)
• Claims– Compile results-done– Run class dependent experiment-done
• 3 factors came out similar to 2 factor cost sensitive exploitation (ie. Exploration didn’t help)• Class dependent dynamic cost setup significantly improves over uniform threshold dynamic cost setup. Class dependent dynamic cost is the best performing setup
– C:\mohit\official\temporal_activeLearning\interactive-sep12\claims\wlp_results_run4.xlsx
• Temporal - 20 newsgroup – Done (1 version)– Results are not great for 20 newsgroup. No clear pattern
• Class-dependent threshold for thought experiment for value of dynamic– Experiment done for 20 news with Pos 0.8 Neg 0.001 – DONE
• Not good results (value of dynamic cost function is shown from earlier results rather than these ones)– Compiled results for Pos9Neg5 for HIVA
• Not great results
• NOISE robustness test– For 1 dataset (may be 20 newsgroup)
• Introduce noise in 1 model, 2 models or all models & see the performance degradation
Writing - compile papers into thesis proposal (highest priority in parallel)
C:\mohit\official\temporal_activeLearning\interactive-sep12\hiva\analysis\Greedy_ChtDyn_Fixed_ChtDynErrrRed_trial1
• Temporal data wit interactive framework– Ran with CD0.2 20 newsgroup– Results not great as there is no desired pattern
Temporal data & Framework – 20 newsgroup-Concept Drift prob 0.2 (CD0.2)
Analysis of Active Learning Curves
• RCV1: 'FCD\rcv1\analysis\activeLearningCurvesSVMv1.mat'– Number of queries are approximately same for all
active iterations– Num of queries are different for different
strategies• Diff for model-fixed, ‘conf-balanced’ and ‘dynErr-
unsupActivepool’ • Rest are all similar & equivalent to avg num of queries
expected
RCV1
20 Newsgroup
Claims
HIVA
Status – LibSVM+Liblinear experimentsMaximized Single Thres Maximized Pos + Neg Thres
Threshold Max Clique
Num Sets
Exp Run Threshold Max Clique Num Sets Exp Run
20 News 0.005 15 38519 Done(need to analyse)
Pos0.08Neg0.001
12 23802 Ready to run.. Svn update & runAspen2 running – rest done
HIVA(v expensive) 6 10 7180 done
Claims-wlp 3.2 12 10634 Done (analyse)-98209.9sec(only dynamicCost)
Claims-wlp 3.2 (dyn cost reduction-0.6)
12 10634 Ready to run (svn & run)-running Aspen1,birch1,birch2
Claims-wlp 3 2493 done Pos4.8Neg3 14 31133 Done(need to analyse)- 148242 seconds (only dynamicCost)
RCV1 0.8 13 18928 done
COST AnalysisHIVA – In vitro number of experiments
COST AnalysisHIVA –Number of features
Thesis chapters – TODO 5/20/2013
• Temporal Active Learning– Merging the current KDD paper with FCD & SSD with claims
data• Show data characteristics of Claims data
– Test for FCD » plot the relative EOB reasons percentage across time iterations) i.e.
<# of Eob codes rows> X <# of time iterations = 23> for each test case (10). Each cell <r,c> corresponds to # of cases with EOB (r) / total positive cases
– Test for SSD» Do relative similarity of all examples across time iterations. So
compare 1000 x 1000 similarity across 2 time iterations and average the similarity. So for each test iteration, we will get a <# of time iterations> X <# of time iterations> graph with their average similarities