Analysing Invertebrate data using CABIN
Stephanie StrachanEnvironment Canada
Columbia Basin Watershed Network ConferencePanorama, BCOct 2, 2009
DRAFT – Page 2 – April 21, 2023
Outline
• Brief intro to CABIN
• Data Sharing
• Your data in CABIN
• How CABIN analysis works – RCA model & assessment– RIVPACS– DEMO
• Analysing data without a model– Metrics– Bray-Curtis
DRAFT – Page 3 – April 21, 2023
What is CABIN?
• Canadian Aquatic BIomonitoring Network
• Standardised biological monitoring for Canada
• Assessment of aquatic health
• Based on network of networks
• Data Sharing & Partnerships!
DRAFT – Page 4 – April 21, 2023
Goals of CABIN
• To add a biological “effects” component to the national water quality monitoring program
• To identify streams where aquatic biota indicate reduced water quality
• Advise and report on status of freshwater quality in Canada with comparable, consistent and scientifically defensible data (e.g., future CESI reporting)
• To provide partners with reference data and tools to apply biological monitoring
DRAFT – Page 5 – April 21, 2023
CABIN Advantages
• CABIN provides a scientifically defensible assessment of your site
• As part of CABIN you are part of national assessment program
• You are sharing reference data with other agencies therefore you are using the same benchmark as federal, provincial and municipal governments
• Adds value to a WQ monitoring program (e.g. detection of non-chemical impacts, verification of assumptions of chemical guidelines, addresses cumulative effects)
DRAFT – Page 6 – April 21, 2023
Why use invertebrates?
• Sedentary = reflect site-specific impacts
• Long-lived (1-3 yrs) = reflect cumulative impacts
• Diverse = respond to a wide range of stressors
• Ubiquitous = can be collected everywhere
• Key part of food web = ecologically important
• Commonly used = protocols are well developed
DRAFT – Page 7 – April 21, 2023
CABIN Methods
• Invertebrates reflect cumulative impacts therefore we measure them annually in the fall
• Standardised collection methods of biota and habitat (for small and large rivers)
• Develop watershed baselines for assessments using a reference condition approach
• Compare potentially impacted sites to reference conditions
DRAFT – Page 8 – April 21, 2023
CABIN Tools
• online resources
• Database (login)
• mapping tool
• analytical tool
• reporting tool
• link with other EC websites
• Online training modules and field certification
http://cabin.cciw.ca
DRAFT – Page 9 – April 21, 2023
Data Sharing Agreement
Current policy:4 years from sampling date
Your data in CABINCheck your data first
-view: site report in CABIN
-export: to check your data
-habitat data
-benthic data
DRAFT – Page 11 – April 21, 2023
RCA Overview
Measure the range of
desiredbiological
conditions with habitat
attributes(reference)
Partition biologicalconditions
into subsets
Compare test site to
appropriate subset
Develop models for predicting
biological subset from habitat
DRAFT – Page 12 – April 21, 2023
Understanding what is “acceptable”
Define the relationship between biology and habitat
Reference site BReference site A
Predictive model
=
Headwater streamsHeadwater streamsLow conductivityLow conductivityShallowShallowLow flowLow flow
Mid-sized streamsMid-sized streamsHigh conductivityHigh conductivityDeepDeepFast flowFast flow
DRAFT – Page 13 – April 21, 2023
Similar to ReferenceSimilar to Reference
Mildly DivergentMildly Divergent
DivergentDivergent
Highly DivergentHighly Divergent
Biological Condition CategoriesBiological Condition Categories
Ax
is 2
-2 -1 0 1 2
Axis 1
-2
-1
0
1
2
Within 90% = reference
Reference site
Test site
90% ellipse
99% ellipse
99.9% ellipse
CABIN results
DRAFT – Page 14 – April 21, 2023
CABIN Site Assessment
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
BYR0100
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 2
BYR0100
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 3
BYR0100
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 1
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 2
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 2
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 2
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 2
-3
-1
1
3
Axi
s 3
-3 -1 1 3Axis 2
-3
-1
1
3
Axi
s 3
BYR0100
Grp1 Grp2 Grp3 Grp4
Test site
0.03 0.01 0.37 0.59
Probability from CABIN/BEAST
prediction
Axis 1 v 2 Axis 1 v 3 Axis 2 v 3 Overall
Similar to Reference
Mildly Divergent
Mildly Divergent
Mildly Divergent
Assessment from CABIN/BEAST
software
DRAFT – Page 15 – April 21, 2023
RIVPACS: Probability of taxa occurrence
FREQUENCY grp1 grp2 grp3 grp4
Clitellata 0.64 0.88 0.56 0.72
Ephemeroptera 0.98 0.93 0.99 0.91
Plecoptera 1.00 0.7 0.98 0.96
Trichoptera 0.92 0.88 0.9 0.89
Diptera (other) 0.25 0.3 0.41 0.27
Chironomidae 1.00 1.00 1.00 1.00
Arachnida (mites) 0.74 0.67 0.82 0.87
Grp1 Grp2 Grp3 Grp4
Test site
0.03 0.01 0.37 0.59
Probability from CABIN/BEAST prediction
Probability seeing Clitellata at Test Site
= 0.64(0.03) + 0.01(0.88) + 0.37(0.56) + 0.59(0.72)
= sum of the frequency of occurrence in each group X probability of belonging to each group
= 0.66
DRAFT – Page 16 – April 21, 2023
RIVPACS: Calculating RIVPACS - example
Grp1 Grp2 Grp3 Grp4
Test site
0.03 0.01 0.37 0.59
Probability from BEAST prediction
FREQUENCY grp1 grp2 grp3 grp4
Clitellata 0.64 0.88 0.56 0.72
Ephemeroptera 0.98 0.93 0.99 0.91
Plecoptera 1.00 0.7 0.98 0.96
Trichoptera 0.92 0.88 0.9 0.89
Diptera (other) 0.25 0.3 0.41 0.27
Chironomidae 1.00 1.00 1.00 1.00
Arachnida (mites) 0.74 0.67 0.82 0.87
1. Calculate the probability of occurrence2. Calculate the O:E scores (refer real data table from slide 15 for 1 site)
*Usually done at family level – order level for exercise
DRAFT – Page 17 – April 21, 2023
RIVPACS: Example Results
Count Prob
Clitellata 100 0.66
Ephemeroptera 27 0.9419
Plecoptera 1 0.966
Trichoptera 35 0.8945
Diptera (other) 27 0.3215
Chironomidae 134 1.00
Arachnida (mites) 47 0.8456
Expected taxa at P>0.70Exp. = .9419 + .966 + .8945 + .8456 = 3.6 taxaObs. = 4
O:E = 4/3.6 = 1.1
DRAFT – Page 18 – April 21, 2023
RIVPACS: Results in CABIN
Taxon Count Prob
Chironomids 42 1.00
Ephemeroptera 31 0.95
Plecoptera 6 0.97
Trichoptera 17 0.90
Mites 0 0.85
Annelida 21 0.66
Coleoptera 0 0.32
Other non-insect 0 0.26
Collembola 0 0.14
other insects 10 0.01
Observed 4 taxa with P>0.70
Expected taxa at P>0.70 = sum prob >0.70= (1+0.95+0.97+0.90+0.85)= 4.67
O:E ratio = 4/4.67 = 0.85
For all Fraser River reference sites
O:E P>0.70 mean = 1.04
90th percentile = 1.1810th percentile = 0.78
Sites within 0.78-1.18 are good. Sites >1.18 = Enriched or diversity hot spots?Site <0.78 - impacted
DRAFT – Page 19 – April 21, 2023
Demonstration
CABIN tools using Fraser River model
Columbia River model currently being developed. Expected completion Spring 2010
DRAFT – Page 20 – April 21, 2023
What does this mean?
How are they similar?
How are they different?
• an array of rows and columns
• data points are counts for each taxon for each sample
• these can be replicates, times, or treatments
Real Data
Order/Class Family Site 1 Site 2 Site 3
Arachnida 47 55 18
Clitellata 100 89 21
Diptera Chironomidae 134 121 58
Diptera Tipulidae 4 7 11
Diptera Simuliidae 12 0 2
Diptera Empididae 11 8 12
Trichoptera Glossosomatdiae 14 5 0
Trichoptera Hyrdopsychidae 21 18 20
Ephemeroptera Heptageniidae 18 5 18
Ephemeroptera Baetidae 9 9 72
Ephemeroptera Ephemerellidae 0 0 1
Ephemeroptera Leptophlebiidae 0 0 2
Plecoptera Perlidae 0 0 1
Plecoptera Nemouridae 1 1 38
Plecoptera Perlodidae 0 0 1
Plecoptera Chloroperlidae 0 0 4
Plecoptera Capniidae 0 0 4
DRAFT – Page 21 – April 21, 2023
Metrics
• Taxonomic richness – how many types of organisms?― Ephemeroptera richness― Plecoptera richness― Trichoptera richness
• Composition metrics - what proportion of the community is dominated by one or few taxa?
― % EPT individuals― % Chironomidae― % non-insects― % Dominance
• Tolerance metrics― # tolerant taxa― % intolerant individuals
• Ecological metrics― % predators (other functional feeding groups)― # clinger taxa
Check CABIN to see how each metric is calculated
Check the waterquality.ec.gc.ca website to see summary and how each responds to a perturbation
DRAFT – Page 22 – April 21, 2023
Real Data Metrics
Site 1 Site 2 Site 3
Abundance 371 318 283
Richness 11 10 16
# EPT taxa 5 5 10
# Ephemeroptera 2 2 4
# Plecoptera 1 1 5
# Trichoptera 2 2 1
% Chironomidae 36% 38% 20%
% EPT 17% 12% 57%
% dominance (top 3) 76% 83% 59%
DRAFT – Page 23 – April 21, 2023
Metrics Results
Family level metrics Test site Reference
Abundance 937 111-2788
Total Richness 12 7-28
EPT Richness 4 2-17
% EPT 64.5 9.2-98.5
% Dominance (top 3 taxa) 84.5 44.4-96.9
% Chironomidae 18.7 0.3-87.1
% non-insects 19.6 0-22.4
# Ephemeroptera taxa 1 0-6
# Plecoptera taxa 2 2-7
# Trichoptera taxa 1 0-6
But what do we
compare this to?
Upstream?Gradient?
“Before” sample?Reference sites?
Assessed usingTarget value, t-test,
ANOVA
DRAFT – Page 24 – April 21, 2023
Similarity among sites in a stream
Which sites are most similar?
Site 1 Site 2 Site 3 Site 4
Arachnida 47 55 18 5
Clitellata 100 89 21 88
Diptera (Chironomidae) 134 121 58 126
Diptera (0ther) 27 15 25 16
Trichoptera 35 23 20 8
Ephemeroptera 27 14 93 41
Plecoptera 1 1 48 22
DRAFT – Page 25 – April 21, 2023
Similarity Coefficient
• S = 0 if two samples have no species in common
• S = 100 if two samples are identical
• CABIN uses Bray-Curtis Similarity Coefficient
Because……A scale change in measurements does not change S as all y values
are multiplied by the same constant
Joint absences have no affect on S, not so for all coefficients
DRAFT – Page 26 – April 21, 2023
Similarity Matrix
• Calculated between every pair of samples
(n(n-1)/2) comparisons• Displayed in a lower triangular
matrix
• Similarity matrices are the basis of most multivariate methods
Site 1Site 2Site 3Site 4
Site 1 vs Site 2Site 1 vs Site 3Site 1 vs Site 4Site 2 vs Site 3Site 2 vs Site 4Site 3 vs Site 4
Site 1 Site 2 Site 3 Site 4
Site 1 100 - - -
Site 2 S12 100 - -
Site 3 S13 S23 100 -
Site 4 S14 S24 S34 100
DRAFT – Page 27 – April 21, 2023
Calculating Bray-Curtis Similarity
}1{1001
1
ikijpi
ikijpi
jk yy
yyS
Species/Site
S 1 S 2 S 3 S 4 S 5 S1,S5
A1 2 5 2 5 3 17
A2 3 5 2 4 3 17
A3 9 1 1 1 1 13
Similarity between sites:S A1,A2 = 100*1-(1+0+0+1+0) / (17+17) = 100*(1- 0.058) = 94.1%S A1,A3 = 100*1-(7+4+1+4+2) / (17+13) = 100*(1- 0.600) = 40.0%S A2,A3 = 100*1-(6+4+1+3+2) / (17+13) = 100*(1-0.533) = 46.7%
n(n-1)/2 coefficients
Thus.... 3(3-1)/2
Calculate....3 similarity coefficients
DRAFT – Page 28 – April 21, 2023
Site 1 vs Site 2
Arach Clitell ChironOther
Dip Trichop Ephem Plec Total
Site 1 47 100 134 27 35 27 1 371
Site 2 55 89 121 15 23 14 1 318
Sum 102 189 255 42 58 41 2 689
|Diff| 8 11 13 12 12 13 0 69
S = 100 x (1 – [ 69 / 689] ) = 100 x ( 1 – [0.100]) = 100 x (0.90) = 0.90
DRAFT – Page 29 – April 21, 2023
Similarity among sites in a stream
Which sites are most similar?
Site 1 Site 2 Site 3 Site 4
Arachnida 47 55 18 5
Clitellata 100 89 21 88
Diptera (Chironomidae) 134 121 58 126
Diptera (0ther) 27 15 25 16
Trichoptera 35 23 20 8
Ephemeroptera 27 14 93 41
Plecoptera 1 1 48 22
DRAFT – Page 30 – April 21, 2023
Similarity Matrix
Site 1 Site 2 Site 3 Site 4
Site 1 100 - - -
Site 2 90 100 - -
Site 3 52.0 48.9 100 -
Site 4 80.1 80.8 58.1 100
DRAFT – Page 31 – April 21, 2023
Data Analysis Summary
*Need to compare to something*What was your objective?How were your sites selected?
• Metrics (target value or Index)– B-IBI calibrated Index for your region
• Upstream-downstream (t-test, ANOVA)– Using metric or similarity or individual taxa counts
• Gradient Analysis – Using metric or similarity or individual taxa counts
• RCA – Set of reference sites; using all taxa in ordination plots– RIVPACS– Metrics compared to reference
• Simplest – GRAPH IT!
DRAFT – Page 32 – April 21, 2023
Simple graphs
Abundance
0
5000
10000
15000
20000
25000
30000
JOS01-07
JOS01-08
JOS02-07
JOS02-08
JOS03-07
JOS03-08
0
5
10
15
20
25
30
JOS01-07
JOS01-08
JOS02-07
JOS02-08
JOS03-07
JOS03-08
Richness EPT Richness
• Abundance much higher in JOS03• EPT richness pattern follows Total Richness – report only 1 of these• Chironomidae are the dominant Dipteran and similar proportion of Dipteran at all sites
00.05
0.10.15
0.20.25
0.30.35
0.40.45
0.5
JOS01-07
JOS01-08
JOS02-07
JOS02-08
JOS03-07
JOS03-08
% Diptera %Chironomidae
DRAFT – Page 33 – April 21, 2023
Data Interpretation
• CABIN is a screening tool– Tells us if there is a problem, not what the problem is– Components of the community give us clues about what the
problem might be– Used to complement WQ chemical data– Can also evaluate habitat disturbance– Can be used to track changes over time
• Need to do further investigation to determine the cause of the problem detected