Post on 13-Jan-2016
description
transcript
Computer Networks SeminarSpring 2007
A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives
Miguel JimenoKen Christensen
Department of Computer Science and EngineeringUniversity of South Florida
Tampa, FL 33620{mjimeno, christen}@cse.usf.edu
2 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
3 Computer Networks SeminarSpring 2007
The internet consumes 2% of all the electricity consumed in the US.[1]
An average PC consumes 120 W when fully powered-on.[10]
PCs could add 10% to the typical US residential consumption.
P2P Applications make the PC remain “on the net” all the time, (they are idle 99% of the time)
Introduction
[1]K. Kawamoto, J. Koomey, B. Nordman, R. Brown, M. Piette, M. Ting, and A. Meier, “Electricity Used by Office Equipment and Network Equipment in the U.S.: Detailed Report and Appendices,” Technical Report LBNL-45917, Energy Analysis Department, Lawrence Berkeley National Laboratory, 2001.
4 Computer Networks SeminarSpring 2007
Can a P2P application can be run in small, low-power microcontroller?
The PC could then be power managed. The microcontroller can’t store large list of file
names.
Bloom Filters: Bloom filters are a well known probabilistic data
structure for representing a list of file name strings.
Introduction
5 Computer Networks SeminarSpring 2007
Figure 1. Bloom filter of size m bits, and k = 4 hash functions.
Image Taken from [9]
False negatives are not possible, but there is a probability of generating false positives.
where m = size of the Bloom filter in bits, k = number of hash functions used to calculate
a Bloom filter, and s = number of bits set.
k
m
s]positivefalsePr[
Bloom Filters: A group of hash functions are used to map
elements into an array of bits.
Introduction
6 Computer Networks SeminarSpring 2007
Background
Bloom filters were first proposed by Bloom [2]
Kirsch et. al. proposed a way to calculate bloom filter with less hashing [7]
Lumetta et. al. used the Power of Two Choices to calculate the bloom filter [8]
[2] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,” Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970.
7 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
8 Computer Networks SeminarSpring 2007
We investigated new methods for reducing the probability of false positives for a Bloom filter for fixed m and n.
The target is the implementation of this structure in a power management proxy.
Research Problem
9 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
10 Computer Networks SeminarSpring 2007
NICs support up to MAC layer, but can’t respond to higher-layer packets.
A PC needs to be fully powered-on in order to respond to packets.
Applications like P2P file sharing require the PC to be fully powered-on all the time.
To manage power in PCs running P2P applications:- We are studying the idea of using small controller to
proxy for a sleeping PC.
The SmartNIC
11 Computer Networks SeminarSpring 2007
This proxy will be able to maintain P2P TCP connections and respond to query messages.
We are exploring locating the controller on the NIC, so it’s a “SmartNIC”.
NIC (internal to PC)
Traffic flow from/to PC
Traffic flow from/to NIC
PC in sleepPC fully on
Figure 2. The SmartNIC with proxy capability
InternetInternet
(a) (b)
Proxying is enabledNIC (internal to PC)
Traffic flow from/to PC
Traffic flow from/to NIC
PC in sleepPC fully on
Figure 2. The SmartNIC with proxy capability
InternetInternet
(a) (b)
Proxying is enabled
The SmartNIC
12 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
13 Computer Networks SeminarSpring 2007
Best-of-N method: N instances of a Bloom filter are generated and the instance with the least number of bits set to 1 is selected.
The “winner” hash group is used to test the bloom filter.
Bloom Filter
Generate a Bloom filter
instance
Final Instance
Select instance if number of bits set is the
smallest
New instance created
next seeded group of hash functions
Set of Strings
Strings to be inserted
N times
Figure 3: Best-of-N Method for Bloom filter
Bloom Filter
Bloom Filter
Generate a Bloom filter
instance
Generate a Bloom filter
instance
Final Instance
Final Instance
Select instance if number of bits set is the
smallest
Select instance if number of bits set is the
smallest
New instance created
next seeded group of hash functions
Set of Strings
Strings to be inserted
Set of StringsSet of Strings
Strings to be inserted
N times
Figure 3: Best-of-N Method for Bloom filter
The New Design: Best-of-N method
1) What improvement in Pr[false positive] can be achieved?
2) What is the computational cost to generate the filter?
14 Computer Networks SeminarSpring 2007
In order to compute N instances quickly, we developed a new pseudo-hashing method called “RNG hashing”.
This method, based on a Random Number Generator, generates multiple hashes from one initial “seed” hash.
The New Design: Best-of-N method
15 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
16 Computer Networks SeminarSpring 2007
We define S to be the random variable for the number of bits set in a Bloom filter.
Using order statistics we can determine the distribution of the minimum value of the independent samples S1, S2, …, SN (selected as Best-of-N).
For order statistics, if f(s) and F(s) are known, then
)S,...,S,Smin(SS N)(min 211
Analysis of Best-of-N Method
17 Computer Networks SeminarSpring 2007
For a continuous distribution, )s(f))s(F(N)s(f Nmin
11
The mean can be computed as ds)s(sf]S[E minmin
Based on heuristic and empirical evidence, the distribution of S appears to be close to normal. Now we have that
where μ=E[S] and σ= σ[S]. We know that
kn
mm]S[E
111
2
2
2
1
12
1
22
1
sN
Nmin es
erfcN)s(f
Analysis of Best-of-N Method
18 Computer Networks SeminarSpring 2007
We derive knknknkn
m
mm
m
mm
m
mm
m
mm]S[
1221 222
The probability of false positive for our method is then:
k
min
m
]S[E]positivefalsePr[
where E[Smin] is computed by substituting above.
Analysis of Best-of-N Method
19 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
20 Computer Networks SeminarSpring 2007
For a given m and n where k is chosen optimally, we study the probability of false positive as a function of N.
1.00
1.05
1.10
1.15
1.20
1.25
1.30
0 10 20 30 40 50 60 70 80 90 100
Number of instances (N )
Impr
ovem
ent f
acto
r
m/n = 32
m/n = 16
m/n = 8
m/n = 64
Figure 4. Improvement factor for various m /n
Numerical Results
30%
21 Computer Networks SeminarSpring 2007
4.0E-04
4.2E-04
4.4E-04
4.6E-04
4.8E-04
5.0E-04
0 10 20 30 40 50 60 70 80 90 100
Number of instances (N )
Pr[f
alse
pos
itive
]
Figure 5. Probability of false positive for m /n = 16
1.4E-07
1.6E-07
1.8E-07
2.0E-07
2.2E-07
2.4E-07
0 10 20 30 40 50 60 70 80 90 100
Number of instances (N )
Pr[f
alse
pos
itive
]
Figure 6. Probability of false positive for m /n = 32
For Figure 5, n = 1000 and m = 16,000. For Figure 6, same n, but m = 32,000
Numerical Results
22 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
23 Computer Networks SeminarSpring 2007
Environment- Dell OptiPlex GX620 PC (Pentium4, 3.4 Ghz, 2 MBytes
cache) with 1 GByte RAM.- WindowsXP, gcc compiler (version 3.4.2 mingw-special
from Dev C++.- A list of 25,000 strings of unique music file names was
obtained using Bearshare 5.2. Response Variables
- Probability of false positive for the Bloom filter.- Execution time to generate a Bloom filter.
Experiments Evaluation
24 Computer Networks SeminarSpring 2007
Control variables- Hashing method used.
• CRC32, Md5, RNG Method, Kirsch Method- Bloom filter parameters m, n, and k.- Best-of-N parameter N.- Number of strings used in the string test set.
Experiments Description- False Positive Exp 1: Vary N, measure Prob. of False
Positive.- False Positive Exp 2: Vary N, measure False Pos.- Run-time experiment: Collect CPU time for each N.
Experiments Evaluation
25 Computer Networks SeminarSpring 2007
4.0E-04
4.1E-04
4.2E-04
4.3E-04
4.4E-04
4.5E-04
4.6E-04
0 10 20 30 40 50 60 70 80 90 100Number of instances (N )
Pr[
fals
e po
siti
ve]
AnalysisMD5CRC32RNG methodKirsch
Figure 7. Results from false positive experiment #1
0.0E+00
1.0E-01
2.0E-01
3.0E-01
4.0E-01
5.0E-01
6.0E-01
0 10 20 30 40 50 60 70 80 90 100
Number of instances (N )
CP
U T
ime
(sec
)
MD5CRC32RNG methodKirsch
Figure 8. Results from run time experiment
The experimental results for probability of false positive perfectly agree with the analysis.
CPU time results of RNG method were as good as Kirsch method, and better than CRC32.
Experiments Evaluation
Kirsch and RNG
26 Computer Networks SeminarSpring 2007
Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work
Outline
27 Computer Networks SeminarSpring 2007
Two Improvements to Bloom filters- A new Best-of-N method that reduces the probability of
false positive by generating N instances of a Bloom filter and selecting the best one.
- A new RNG hashing method that generates pseudo hashes given a single seed hash.
Bloom filters could be implemented in a power management proxy for P2P applications.
Savings of up to 85 Mill. could be obtained if 25% of PCs running P2P applications use SmartNICs.
Summary & Future Work
28 Computer Networks SeminarSpring 2007
3. A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” Internet Mathematics, Vol. 1, No. 4, pp. 485-509, 2005.
4. Energy Information Administration, “U.S Household Electricity Report,” July 2005. Available: http://www.eia.doe.gov/emeu/reps/enduse/er01_us.html.
5. L. Fan, P. Cao, and J. Almeida, “Bloom Filters - The Math,” 2000. Available: http://www.cs.wisc.edu/~cao/ papers/summary-cache/node8.html.
6. A. Kirsch and M. Mitzenmacher, “Less Hashing, Same Performance: Building a Better Bloom Filter,” Technical Report TR-02-5, Computer Science Group, Harvard University, 2005.
7. S. Lumetta and M. Mitzenmacher, “Using the Power of Two Choices to Improve Bloom Filters,” unpublished, 2006. Available: http://www.eecs.harvard.edu/~michaelm/ postscripts/bftwo.ps.
8. A. Pagh, R. Pagh, and S. Rao, “An Optimal Bloom Filter Replacement,” Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 823-829, 2005.
9. http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html10. US Department of Energy, Energy Efficiency and Renewable Energy, “Estimating
Appliance and Home Electronic Energy Use,” 2005. Available: http://www.eere.energy.gov/consumer/your_home/appliances/index.cfm/mytopic=10040.
References
29 Computer Networks SeminarSpring 2007
Thanks!
I’ll be happy to answer any questions.