The Sao Paulo Advanced School of Cryptography — SP-ASCrypto 2011
Hardware Implementation of Pairings
Francisco Rodrıguez-HenrıquezCINVESTAV-IPN, Mexico City, Mexico
Joint work with:Jean-Luc Beuchat LCIS, University of Tsukuba, Japan
Nicolas Brisebarre Arenaire, LIP, ENS Lyon, FranceJeremie Detrey Caramel, INRIA Nancy Grand-Est, FranceNicolas Estibals Caramel, INRIA Nancy Grand-Est, FranceEiji Okamoto LCIS, University of Tsukuba, Japan
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (1 / 91)
Outline of the talk
1 Context and motivation
2 Computing the Tate Pairing
3 Case of Study #1: A compact implementation of the ηT pairing
4 Case of Study #2: Estibals’ composite ηT pairing
5 Case of Study #3: A fast implementation of the ηT pairing
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (2 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (3 / 91)
Elliptic curves
E defined by a Weierstraß equation of the form
y2 = x3 + Ax + B
E (K ) set of rational points over a field K
Additive group law over E (K )
Many applications in cryptography since 1985I EC-based Diffie-Hellman key exchangeI EC-based Digital Signature AlgorithmI ...
Interest: smaller keys than usual cryptosystems (RSA, ElGamal, ...)
But there’s more: bilinear pairings
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (4 / 91)
Elliptic curves
E defined by a Weierstraß equation of the form
y2 = x3 + Ax + B
E (K ) set of rational points over a field K
Additive group law over E (K )
Many applications in cryptography since 1985I EC-based Diffie-Hellman key exchangeI EC-based Digital Signature AlgorithmI ...
Interest: smaller keys than usual cryptosystems (RSA, ElGamal, ...)
But there’s more: bilinear pairings
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (4 / 91)
Elliptic curves
E defined by a Weierstraß equation of the form
y2 = x3 + Ax + B
E (K ) set of rational points over a field K
Additive group law over E (K )
Many applications in cryptography since 1985I EC-based Diffie-Hellman key exchangeI EC-based Digital Signature AlgorithmI ...
Interest: smaller keys than usual cryptosystems (RSA, ElGamal, ...)
But there’s more: bilinear pairings
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (4 / 91)
Elliptic curves
E defined by a Weierstraß equation of the form
y2 = x3 + Ax + B
E (K ) set of rational points over a field K
Additive group law over E (K )
Many applications in cryptography since 1985I EC-based Diffie-Hellman key exchangeI EC-based Digital Signature AlgorithmI ...
Interest: smaller keys than usual cryptosystems (RSA, ElGamal, ...)
But there’s more: bilinear pairings
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (4 / 91)
Elliptic curves
E defined by a Weierstraß equation of the form
y2 = x3 + Ax + B
E (K ) set of rational points over a field K
Additive group law over E (K )
Many applications in cryptography since 1985I EC-based Diffie-Hellman key exchangeI EC-based Digital Signature AlgorithmI ...
Interest: smaller keys than usual cryptosystems (RSA, ElGamal, ...)
But there’s more: bilinear pairings
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (4 / 91)
Elliptic curves
E defined by a Weierstraß equation of the form
y2 = x3 + Ax + B
E (K ) set of rational points over a field K
Additive group law over E (K )
Many applications in cryptography since 1985I EC-based Diffie-Hellman key exchangeI EC-based Digital Signature AlgorithmI ...
Interest: smaller keys than usual cryptosystems (RSA, ElGamal, ...)
But there’s more: bilinear pairings
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (4 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉
Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Group cryptography
Let (G1, +) be an additively-written cyclic group of prime order#G1 = `
P, a generator of the group: G1 = 〈P〉Scalar multiplication: for any integer k, we havekP = P + P + · · ·+ P︸ ︷︷ ︸
k times
P
k
kP
Discrete logarithm: given Q ∈ G1, compute k such that Q = kP
kQ = P
k
We assume that the discrete logarithm problem (DLP) in G1 is hard
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (5 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Bilinear pairings
Let (G1, +), (G2, +) be two additively-written cyclic groups of prime order#G1 = #G2 = `
(Gτ ,×), a multiplicatively-written cyclic group of order #Gτ = `
A non-degenerate bilinear pairing is a map
e : G1 ×G2 → Gτ
that satisfies the following conditions:
I non-degeneracy: e(P,P) 6= 1Gτ(equivalently e(P,P) generates Gτ )
I bilinearity:e(Q1+Q2,R) = e(Q1,R)·e(Q2,R) e(Q,R1+R2) = e(Q,R1)·e(Q,R2)
I computability: e can be efficiently computed
Immediate property: for any two integers k1 and k2
e(k1Q, k2R) = e(Q,R)k1k2
When G1 = G2 we say that the pairing is symmetric, otherwise if G1 6= G2,the pairing is asymmetric.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (6 / 91)
Pairings in cryptography
At first, used to attack supersingular elliptic curvesI Menezes-Okamoto-Vanstone and Frey-Ruck attacks, 1993 and 1994
DLPG1 <P DLPGτ
kP −→ e(kP,P) = e(P,P)k
I for cryptographic applications, we will also require the DLP in Gτ to behard
One-round three-party key agreement (Joux, 2000)
Identity-based encryptionI Boneh–Franklin, 2001I Sakai–Kasahara, 2001
Short digital signaturesI Boneh–Lynn–Shacham, 2001I Zang–Safavi-Naini–Susilo, 2004
...
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (7 / 91)
Pairings in cryptography
At first, used to attack supersingular elliptic curvesI Menezes-Okamoto-Vanstone and Frey-Ruck attacks, 1993 and 1994
DLPG1 <P DLPGτ
kP −→ e(kP,P) = e(P,P)k
I for cryptographic applications, we will also require the DLP in Gτ to behard
One-round three-party key agreement (Joux, 2000)
Identity-based encryptionI Boneh–Franklin, 2001I Sakai–Kasahara, 2001
Short digital signaturesI Boneh–Lynn–Shacham, 2001I Zang–Safavi-Naini–Susilo, 2004
...
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (7 / 91)
Pairings in cryptography
At first, used to attack supersingular elliptic curvesI Menezes-Okamoto-Vanstone and Frey-Ruck attacks, 1993 and 1994
DLPG1 <P DLPGτ
kP −→ e(kP,P) = e(P,P)k
I for cryptographic applications, we will also require the DLP in Gτ to behard
One-round three-party key agreement (Joux, 2000)
Identity-based encryptionI Boneh–Franklin, 2001I Sakai–Kasahara, 2001
Short digital signaturesI Boneh–Lynn–Shacham, 2001I Zang–Safavi-Naini–Susilo, 2004
...
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (7 / 91)
Pairings in cryptography
At first, used to attack supersingular elliptic curvesI Menezes-Okamoto-Vanstone and Frey-Ruck attacks, 1993 and 1994
DLPG1 <P DLPGτ
kP −→ e(kP,P) = e(P,P)k
I for cryptographic applications, we will also require the DLP in Gτ to behard
One-round three-party key agreement (Joux, 2000)
Identity-based encryptionI Boneh–Franklin, 2001I Sakai–Kasahara, 2001
Short digital signaturesI Boneh–Lynn–Shacham, 2001I Zang–Safavi-Naini–Susilo, 2004
...
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (7 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (8 / 91)
But.... Why should one bother implementing pairings inHardware?
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (9 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
Some important breakthroughs on pairing computations
1985: Elliptic curve cryptography is independently invented by V.Miller and N. Koblitz
1986: V. Miller devises an iterative algorithm for computing pairings[unpublished report until it appeared at JoC 2004]
1993: MOV attack [Menezes-Okamoto-Vanstone IEEE TIT]
2002: BKLS algorithm [Barreto-Kim-Lynn-Scott Crypto 2002]
2003: Simplifications on the Miller loop and final exponentiationcomputation [Duursma and Lee Asiacrypt’03]
2006: eta pairing [Hess-Smart-Vercauteren IEEE TIT]
2007: ηT pairing [Barreto-Galbraith-hEigeartaigh-Scott DCC’07]
2010: optimal pairings [Vercauteren IEEE TIT]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (10 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
A brief recount of pairing-friendly ordinary curves
MNT curves: [Miyaji-Nakabayashi-Takano IEICE 2001]
A general method for constructing curves of prescribed embeddingdegree: [Cocks and Pinch- unpublished 2001]
BLS curves: [Barreto-Lynn-Scott 2002]
BN Curves: [Barreto and Naehrig SAC’05]
BW Curves: [Brezing and Weng DCC’05]
Freeman Curves: [Freeman ANTS’06]
KSS curves: [Kachisa-Schaefer-Scott Pairing’08]
Unfortunately, we don’t know how to find binary pairing friendly ordinarycurves
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (11 / 91)
But.... Why should one bother implement pairings inHardware?
computation not very well suited for general purpose processor
There exist specific targets, one of the most prominent ones beingsmart cards
Hardware may be the fastest/most efficient way to implementpairings.
However if a pairing hardware accelerator is going to be attractive atall, a significant performance improvement should be observed withrespect to software implementations.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (12 / 91)
But.... Why should one bother implement pairings inHardware?
computation not very well suited for general purpose processor
There exist specific targets, one of the most prominent ones beingsmart cards
Hardware may be the fastest/most efficient way to implementpairings.
However if a pairing hardware accelerator is going to be attractive atall, a significant performance improvement should be observed withrespect to software implementations.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (12 / 91)
But.... Why should one bother implement pairings inHardware?
computation not very well suited for general purpose processor
There exist specific targets, one of the most prominent ones beingsmart cards
Hardware may be the fastest/most efficient way to implementpairings.
However if a pairing hardware accelerator is going to be attractive atall, a significant performance improvement should be observed withrespect to software implementations.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (12 / 91)
But.... Why should one bother implement pairings inHardware?
computation not very well suited for general purpose processor
There exist specific targets, one of the most prominent ones beingsmart cards
Hardware may be the fastest/most efficient way to implementpairings.
However if a pairing hardware accelerator is going to be attractive atall, a significant performance improvement should be observed withrespect to software implementations.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (12 / 91)
A brief recount and notes on the first pairing hardwareaccelerators
First designs appeared circa 2005: [Grabher& Page CHES’05], [Kerinset al. CHES’05]
Historically, the first designs targeted low-security Tate Pairing overSupersingular elliptic curves defined on ternary fields [a typicalselection was F×
397 ]
First designs did not take advantage of the inverse Frobenius map[extracting cube roots was considered expensive]
Curiously, the first hardware implementations of the Tate pairingdefined on binary field extensions appeared one year later: [Keller etal. ARC’06], [Shu et al. FPT’06].
The first hardware designs for the Tate pairing over ordinary curvesdefined on large prime fields came out until 2009: [Fan et al.CHES’09], [Kammler et al. CHES’09]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (13 / 91)
A brief recount and notes on the first pairing hardwareaccelerators
First designs appeared circa 2005: [Grabher& Page CHES’05], [Kerinset al. CHES’05]
Historically, the first designs targeted low-security Tate Pairing overSupersingular elliptic curves defined on ternary fields [a typicalselection was F×
397 ]
First designs did not take advantage of the inverse Frobenius map[extracting cube roots was considered expensive]
Curiously, the first hardware implementations of the Tate pairingdefined on binary field extensions appeared one year later: [Keller etal. ARC’06], [Shu et al. FPT’06].
The first hardware designs for the Tate pairing over ordinary curvesdefined on large prime fields came out until 2009: [Fan et al.CHES’09], [Kammler et al. CHES’09]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (13 / 91)
A brief recount and notes on the first pairing hardwareaccelerators
First designs appeared circa 2005: [Grabher& Page CHES’05], [Kerinset al. CHES’05]
Historically, the first designs targeted low-security Tate Pairing overSupersingular elliptic curves defined on ternary fields [a typicalselection was F×
397 ]
First designs did not take advantage of the inverse Frobenius map[extracting cube roots was considered expensive]
Curiously, the first hardware implementations of the Tate pairingdefined on binary field extensions appeared one year later: [Keller etal. ARC’06], [Shu et al. FPT’06].
The first hardware designs for the Tate pairing over ordinary curvesdefined on large prime fields came out until 2009: [Fan et al.CHES’09], [Kammler et al. CHES’09]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (13 / 91)
A brief recount and notes on the first pairing hardwareaccelerators
First designs appeared circa 2005: [Grabher& Page CHES’05], [Kerinset al. CHES’05]
Historically, the first designs targeted low-security Tate Pairing overSupersingular elliptic curves defined on ternary fields [a typicalselection was F×
397 ]
First designs did not take advantage of the inverse Frobenius map[extracting cube roots was considered expensive]
Curiously, the first hardware implementations of the Tate pairingdefined on binary field extensions appeared one year later: [Keller etal. ARC’06], [Shu et al. FPT’06].
The first hardware designs for the Tate pairing over ordinary curvesdefined on large prime fields came out until 2009: [Fan et al.CHES’09], [Kammler et al. CHES’09]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (13 / 91)
A brief recount and notes on the first pairing hardwareaccelerators
First designs appeared circa 2005: [Grabher& Page CHES’05], [Kerinset al. CHES’05]
Historically, the first designs targeted low-security Tate Pairing overSupersingular elliptic curves defined on ternary fields [a typicalselection was F×
397 ]
First designs did not take advantage of the inverse Frobenius map[extracting cube roots was considered expensive]
Curiously, the first hardware implementations of the Tate pairingdefined on binary field extensions appeared one year later: [Keller etal. ARC’06], [Shu et al. FPT’06].
The first hardware designs for the Tate pairing over ordinary curvesdefined on large prime fields came out until 2009: [Fan et al.CHES’09], [Kammler et al. CHES’09]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (13 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (14 / 91)
FPGA General architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (15 / 91)
General Xilinx Virtex 5 Slice architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (16 / 91)
Xilinx FPGA Families
Virtex-5 Virtex-4 Virtex II Pro Spartan 3 & 3E
Logic Cells up to 330K 12K-200K 3K-99K 1.7K-74KBRAM 576 36-512 12-444 4-104
(18Kbits each)Multipliers 32− 1921 32-512 12-444 4-104
DCM up to 18 4-20 4-12 2-18IOBs up to 1200 240-960 204-1164 63-633
DSP Slices 32-192 32-192 — –PowerPC Blocks N/A 0-2 0-2 –
Max. freq. 550MHz 500MHz 547 MHz up to 300MHzTechnology 1.0V, 65ηm 1.2V, 90ηm, 1.5V, 130ηm, 1.2V, 90ηm,
copper CMOS triple-oxide process 9-layer CMOS triple-oxide processPrice ≈ $400USD From $300 From $139 From $2 up to $85
125× 18 embedded multipliers
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (17 / 91)
Design Methodology for FPGA designs
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (18 / 91)
Measures of performance in reconfigurable Hardwaredevices
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (19 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Measures of performance in reconfigurable Hardwaredevices
Computational time defined as:
# of clock cycles
clock cycle frequency
Throughput defined as:
# of bits processed · clock cycle frequency
# of clock cycles
Latency:# of clock cycles required for producing the first computation
Amount of hardware resources utilized by the design. Including slices,dedicated memories, DSP slices, etc.
Time-Area product
Power consumption, energy consumption, ...
In the case of cryptographic designs: Side-channel resistance
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (20 / 91)
Parallel techniques in hardware
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (21 / 91)
Parallel techniques in hardware
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (21 / 91)
Parallel techniques in hardware: AES example
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (22 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (23 / 91)
The Tate Pairing over Supersingular elliptic curves
We first defineI Fq, a finite field, with q = 2m or 3m
I E , an elliptic curve defined over Fq
I `, a large prime factor of #E (Fq)
G1 = E (Fq)[`], the Fq-rational `-torsion of E :
G1 = {P ∈ E (Fq) | `P = O}
Gτ = µ`, the group of `-th roots of unity in F×qk :
Gτ = {U ∈ F×qk | U` = 1}
k is the embedding degree, the smallest integer such that µ` ⊆ F×qk
I usually large for ordinary elliptic curvesI bounded in the case of supersingular elliptic curves
(4 in characteristic 2; 6 in characteristic 3; and 2 in characteristic > 3)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (24 / 91)
The Tate Pairing over Supersingular elliptic curves
We first defineI Fq, a finite field, with q = 2m or 3m
I E , an elliptic curve defined over Fq
I `, a large prime factor of #E (Fq)
G1 = E (Fq)[`], the Fq-rational `-torsion of E :
G1 = {P ∈ E (Fq) | `P = O}
Gτ = µ`, the group of `-th roots of unity in F×qk :
Gτ = {U ∈ F×qk | U` = 1}
k is the embedding degree, the smallest integer such that µ` ⊆ F×qk
I usually large for ordinary elliptic curvesI bounded in the case of supersingular elliptic curves
(4 in characteristic 2; 6 in characteristic 3; and 2 in characteristic > 3)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (24 / 91)
The Tate Pairing over Supersingular elliptic curves
We first defineI Fq, a finite field, with q = 2m or 3m
I E , an elliptic curve defined over Fq
I `, a large prime factor of #E (Fq)
G1 = E (Fq)[`], the Fq-rational `-torsion of E :
G1 = {P ∈ E (Fq) | `P = O}
Gτ = µ`, the group of `-th roots of unity in F×qk :
Gτ = {U ∈ F×qk | U` = 1}
k is the embedding degree, the smallest integer such that µ` ⊆ F×qk
I usually large for ordinary elliptic curvesI bounded in the case of supersingular elliptic curves
(4 in characteristic 2; 6 in characteristic 3; and 2 in characteristic > 3)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (24 / 91)
The Tate Pairing over Supersingular elliptic curves
We first defineI Fq, a finite field, with q = 2m or 3m
I E , an elliptic curve defined over Fq
I `, a large prime factor of #E (Fq)
G1 = E (Fq)[`], the Fq-rational `-torsion of E :
G1 = {P ∈ E (Fq) | `P = O}
Gτ = µ`, the group of `-th roots of unity in F×qk :
Gτ = {U ∈ F×qk | U` = 1}
k is the embedding degree, the smallest integer such that µ` ⊆ F×qk
I usually large for ordinary elliptic curvesI bounded in the case of supersingular elliptic curves
(4 in characteristic 2; 6 in characteristic 3; and 2 in characteristic > 3)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (24 / 91)
Security considerationse : E (Fq)[`]× E (Fq)[`]→ µ` ⊆ F×qk
Discrete logarithm in G1 = E (Fq)[`] (Pollard’s ρ):
√` ≈ √q = exp
(1
2· (ln q)
)
Discrete logarithm in G2 = µ` ⊆ F×qk (FFS or NFS):
exp(c · (ln qk)
13 · (ln ln qk)
23
)
The discrete logarithm problem is usually easier in G2 than in G1
I current security: ∼ 280, equivalent to 80-bit symmetric encryption orRSA-1024
I recommended security: ∼ 2128 (AES-128, RSA-3072)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (25 / 91)
Security considerationse : E (Fq)[`]× E (Fq)[`]→ µ` ⊆ F×qk
Discrete logarithm in G1 = E (Fq)[`] (Pollard’s ρ):
√` ≈ √q
= exp
(1
2· (ln q)
)
Discrete logarithm in G2 = µ` ⊆ F×qk (FFS or NFS):
exp(c · (ln qk)
13 · (ln ln qk)
23
)
The discrete logarithm problem is usually easier in G2 than in G1
I current security: ∼ 280, equivalent to 80-bit symmetric encryption orRSA-1024
I recommended security: ∼ 2128 (AES-128, RSA-3072)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (25 / 91)
Security considerationse : E (Fq)[`]× E (Fq)[`]→ µ` ⊆ F×qk
Discrete logarithm in G1 = E (Fq)[`] (Pollard’s ρ):
√` ≈ √q = exp
(1
2· (ln q)
)
Discrete logarithm in G2 = µ` ⊆ F×qk (FFS or NFS):
exp(c · (ln qk)
13 · (ln ln qk)
23
)
The discrete logarithm problem is usually easier in G2 than in G1
I current security: ∼ 280, equivalent to 80-bit symmetric encryption orRSA-1024
I recommended security: ∼ 2128 (AES-128, RSA-3072)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (25 / 91)
Security considerationse : E (Fq)[`]× E (Fq)[`]→ µ` ⊆ F×qk
Discrete logarithm in G1 = E (Fq)[`] (Pollard’s ρ):
√` ≈ √q = exp
(1
2· (ln q)
)
Discrete logarithm in G2 = µ` ⊆ F×qk (FFS or NFS):
exp(c · (ln qk)
13 · (ln ln qk)
23
)
The discrete logarithm problem is usually easier in G2 than in G1
I current security: ∼ 280, equivalent to 80-bit symmetric encryption orRSA-1024
I recommended security: ∼ 2128 (AES-128, RSA-3072)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (25 / 91)
Security considerationse : E (Fq)[`]× E (Fq)[`]→ µ` ⊆ F×qk
Discrete logarithm in G1 = E (Fq)[`] (Pollard’s ρ):
√` ≈ √q = exp
(1
2· (ln q)
)
Discrete logarithm in G2 = µ` ⊆ F×qk (FFS or NFS):
exp(c · (ln qk)
13 · (ln ln qk)
23
)
The discrete logarithm problem is usually easier in G2 than in G1
I current security: ∼ 280, equivalent to 80-bit symmetric encryption orRSA-1024
I recommended security: ∼ 2128 (AES-128, RSA-3072)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (25 / 91)
Security considerations for Symmetric Pairings
e : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
The discrete logarithm problem should be hard in both G1 and Gτ
Base field (Fpm) F2m F3m
Lower security (∼ 264) m = 239 m = 97
Medium security (∼ 280) m = 373 m = 163
Higher security (∼ 2128) m = 1103 m = 503
F2m : simpler finite field arithmetic
F3m : smaller field extension
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (26 / 91)
Security considerations for Symmetric Pairings
e : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
The discrete logarithm problem should be hard in both G1 and Gτ
Base field (Fpm) F2m F3m
Lower security (∼ 264) m = 239 m = 97
Medium security (∼ 280) m = 373 m = 163
Higher security (∼ 2128) m = 1103 m = 503
F2m : simpler finite field arithmetic
F3m : smaller field extension
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (26 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (27 / 91)
Barreto–Naehrig Curves
Defined by the equation E : y2 = x3 + b, where b 6= 0. Their embeddingdegree k is equal to 12. The characteristic p of the prime field, the grouporder r , and the trace of Frobenius tr of the curve are parametrized asfollows:
p(t) = 36t4 + 36t3 + 24t2 + 6t + 1,
r(t) = 36t4 + 36t3 + 18t2 + 6t + 1, (1)
tr (t) = 6t2 + 1,
where t ∈ Z is an arbitrary integer such that p = p(t) and r = r(t) areboth prime numbers.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (28 / 91)
Barreto–Naehrig Curves
Defined by the equation E : y2 = x3 + b, where b 6= 0. Their embeddingdegree k is equal to 12. The characteristic p of the prime field, the grouporder r , and the trace of Frobenius tr of the curve are parametrized asfollows:
p(t) = 36t4 + 36t3 + 24t2 + 6t + 1,
r(t) = 36t4 + 36t3 + 18t2 + 6t + 1, (1)
tr (t) = 6t2 + 1,
where t ∈ Z is an arbitrary integer such that p = p(t) and r = r(t) areboth prime numbers.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (28 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given by
πp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given byπp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given byπp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given byπp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given byπp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given byπp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Barreto–Naehrig CurvesLet E [r ] denote the r -torsion subgroup of E and πp be the Frobeniusendomorphism πp : E → E given byπp(x , y) = (xp, yp). We define,
G1 = E (Fp)[r ],
G2 ⊆ E (Fp12)[r ],
Gτ = µr ⊂ F∗p12 (i.e. the group of r -th roots of unity).
The optimal ate pairing on the BN curve E is given as,
aopt : G2 ×G1 −→ G3
(Q,P) 7−→(f6t+2,Q(P) · l[6t+2]Q,πp(Q)(P) ·
l[6t+2]Q+πp(Q),−π2p(Q)(P)
) p12−1r ,
In practice, pairing computations can be restricted to points P and Q ′
that belong to E (Fp) and E ′(Fp2), respectively, where,E ′/Fp2 : y2 = x3 + b/ξ.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (29 / 91)
Computational costs of the tower extension field arithmetic
Field Add/Sub Mult Squaring Inversion
Fp2 a = 2a m = 3m + 3a + mβ s = 2m + 3a + mβi = 4m + mβ
+2a + i
Fp6 3a 6m + 2mξ + 15a 2m + 3s + 2mξ + 8a9m + 3s + 4mξ
+4a + i
Fp12 6a 18m + 6mξ + 60a 12m + 4mξ + 45a25m + 9s + 12mξ
+61a + i
GΦ6(Fp2) 6a 18m + 6mξ + 60a9s + 4mξ Conjugate
+30a
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (30 / 91)
Optimal ate pairing algorithmInput: P ∈ G1 y Q ∈ G2.Output: aopt(Q,P).
1. Write s = 6t + 2 as s =∑L−1
i=0 si2i , where si ∈ {−1, 0, 1};
2. T ← Q, f ← 1;3. for i = L− 2 to 0 do
4. f ← f 2 · lT ,T (P); T ← 2T ;5. if si = −1 then6. f ← f · lT ,−Q(P); T ← T − Q;7. else if si = 1 then8. f ← f · lT ,Q(P); T ← T + Q;9. end if
10. end for11. Q1 ← πp(Q); Q2 ← πp2(Q);12. f ← f · lT ,Q1(P); T ← T + Q1;13. f ← f · lT ,−Q2(P); T ← T − Q2;
14. f ← f (p12−1)/r ;15. return f ;
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (31 / 91)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.
β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.
β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.
β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.
β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (32 / 91)
hence, we can writef ∈ Fp12 as
f = g + hw
= g + hw
= g0 + h0W + g1W2 +
h1W3 + g2W
4 + h2W5.
f = g + hw ∈ Fp12 ,with g , h ∈ Fp6 .
but alsog = g0+g1v+g2v
2,h = h0+h1v+h2v
2,where gi , hi ∈ Fp2 ,for i = 1, 2, 3.β = 1
ξ = u
γ = v
Since p mod 12 ≡ 1 we can build the towering up to the twelfthextension by adjoining irreducible binomial only.
Supersingular elliptic curves Vs. Barreto–Naehrig curves
I Definition:
E/F3 : y2 = x3 − x + b, b 6= 0
I Definition:
E/Fp : y2 = x3 + b, b 6= 0,
p = 36α4 − 36α3 + 24α2 − 6α + 1
I Supersingular curve I Ordinary curve⇒ Simpler curve arithmetic (efficient tripling formulae)
I Distortion map, modified pairing:
δ : E(Fq )[`] → E(Fqk )[`]
e(P, Q) = e(P, δ(Q))
I No distortion map
⇒ Symmetric pairing (BN cannot be used with all protocols)
I Small characteristic field arithmetic I Modular arithmetic⇒ No carry, better suited to hardware implementation
I Small embedding degree (k = 6) I Optimal embedding degree (k = 12)⇒ Larger field of definition for the same security level. For 128 bits of security:
Fq with q ≈ 3500Fp with p a 256-bit prime.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (33 / 91)
Supersingular elliptic curves Vs. Barreto–Naehrig curves
I Definition:
E/F3 : y2 = x3 − x + b, b 6= 0
I Definition:
E/Fp : y2 = x3 + b, b 6= 0,
p = 36α4 − 36α3 + 24α2 − 6α + 1
I Supersingular curve I Ordinary curve⇒ Simpler curve arithmetic (efficient tripling formulae)
I Distortion map, modified pairing:
δ : E(Fq )[`] → E(Fqk )[`]
e(P, Q) = e(P, δ(Q))
I No distortion map
⇒ Symmetric pairing (BN cannot be used with all protocols)
I Small characteristic field arithmetic I Modular arithmetic⇒ No carry, better suited to hardware implementation
I Small embedding degree (k = 6) I Optimal embedding degree (k = 12)⇒ Larger field of definition for the same security level. For 128 bits of security:
Fq with q ≈ 3500Fp with p a 256-bit prime.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (33 / 91)
Supersingular elliptic curves Vs. Barreto–Naehrig curves
I Definition:
E/F3 : y2 = x3 − x + b, b 6= 0
I Definition:
E/Fp : y2 = x3 + b, b 6= 0,
p = 36α4 − 36α3 + 24α2 − 6α + 1
I Supersingular curve I Ordinary curve⇒ Simpler curve arithmetic (efficient tripling formulae)
I Distortion map, modified pairing:
δ : E(Fq )[`] → E(Fqk )[`]
e(P, Q) = e(P, δ(Q))
I No distortion map
⇒ Symmetric pairing (BN cannot be used with all protocols)
I Small characteristic field arithmetic I Modular arithmetic⇒ No carry, better suited to hardware implementation
I Small embedding degree (k = 6) I Optimal embedding degree (k = 12)⇒ Larger field of definition for the same security level. For 128 bits of security:
Fq with q ≈ 3500Fp with p a 256-bit prime.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (33 / 91)
Supersingular elliptic curves Vs. Barreto–Naehrig curves
I Definition:
E/F3 : y2 = x3 − x + b, b 6= 0
I Definition:
E/Fp : y2 = x3 + b, b 6= 0,
p = 36α4 − 36α3 + 24α2 − 6α + 1
I Supersingular curve I Ordinary curve⇒ Simpler curve arithmetic (efficient tripling formulae)
I Distortion map, modified pairing:
δ : E(Fq )[`] → E(Fqk )[`]
e(P, Q) = e(P, δ(Q))
I No distortion map
⇒ Symmetric pairing (BN cannot be used with all protocols)
I Small characteristic field arithmetic I Modular arithmetic⇒ No carry, better suited to hardware implementation
I Small embedding degree (k = 6) I Optimal embedding degree (k = 12)⇒ Larger field of definition for the same security level. For 128 bits of security:
Fq with q ≈ 3500Fp with p a 256-bit prime.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (33 / 91)
Supersingular elliptic curves
Vs. Barreto–Naehrig curves
I Definition:
E/F3 : y2 = x3 − x + b, b 6= 0
I Definition:
E/Fp : y2 = x3 + b, b 6= 0,
p = 36α4 − 36α3 + 24α2 − 6α + 1
I Supersingular curve
I Ordinary curve
⇒ Simpler curve arithmetic (efficient tripling formulae)
I Distortion map, modified pairing:
δ : E(Fq )[`] → E(Fqk )[`]
e(P, Q) = e(P, δ(Q))
I No distortion map
⇒ Symmetric pairing
(BN cannot be used with all protocols)
I Small characteristic field arithmetic
I Modular arithmetic
⇒ No carry, better suited to hardware implementation
I Small embedding degree (k = 6)
I Optimal embedding degree (k = 12)
⇒ Larger field of definition for the same security level.
For 128 bits of security:
Fq with q ≈ 3500
Fp with p a 256-bit prime.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (33 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (34 / 91)
Computing the reduced Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :I O(m) additions / subtractionsI O(m) multiplicationsI O(m) Frobenius maps (a 7→ ap, i.e. squarings or cubings)I 1 inversion
A first idea: an all-in-one unified operator:I shared resourcesI scalable architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (35 / 91)
Computing the reduced Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :I O(m) additions / subtractionsI O(m) multiplicationsI O(m) Frobenius maps (a 7→ ap, i.e. squarings or cubings)I 1 inversion
A first idea: an all-in-one unified operator:I shared resourcesI scalable architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (35 / 91)
Computing the reduced Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :I O(m) additions / subtractionsI O(m) multiplicationsI O(m) Frobenius maps (a 7→ ap, i.e. squarings or cubings)I 1 inversion
A first idea: an all-in-one unified operator:I shared resourcesI scalable architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (35 / 91)
Computing the reduced Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :I O(m) additions / subtractionsI O(m) multiplicationsI O(m) Frobenius maps (a 7→ ap, i.e. squarings or cubings)I 1 inversion
A first idea: an all-in-one unified operator:I shared resourcesI scalable architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (35 / 91)
Computing the reduced Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :I O(m) additions / subtractionsI O(m) multiplicationsI O(m) Frobenius maps (a 7→ ap, i.e. squarings or cubings)I 1 inversion
A first idea: an all-in-one unified operator:I shared resourcesI scalable architecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (35 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (36 / 91)
Arithmetic over F3m
f ∈ F3[x ]: degree-m irreducible polynomial over F3
f = xm + fm−1xm−1 + · · ·+ f1x + f0
F3m ∼= F3[x ]/(f )
a ∈ F3m :a = am−1x
m−1 + · · ·+ a1x + a0
Each element of F3 stored using two bits [also called trits]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (37 / 91)
Arithmetic over F3m
f ∈ F3[x ]: degree-m irreducible polynomial over F3
f = xm + fm−1xm−1 + · · ·+ f1x + f0
F3m ∼= F3[x ]/(f )
a ∈ F3m :a = am−1x
m−1 + · · ·+ a1x + a0
Each element of F3 stored using two bits [also called trits]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (37 / 91)
Addition over F3m
r = a + b = (am−1 + bm−1)xm−1 + · · ·+ (a1 + b1)x + (a0 + b0)
I coefficient-wise additions over F3: ri = (ai + bi ) mod 3I addition over F3: small look-up tables
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (38 / 91)
Addition over F3m
r = a + b = (am−1 + bm−1)xm−1 + · · ·+ (a1 + b1)x + (a0 + b0)
I coefficient-wise additions over F3: ri = (ai + bi ) mod 3
I addition over F3: small look-up tables
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (38 / 91)
Addition over F3m
r = a + b = (am−1 + bm−1)xm−1 + · · ·+ (a1 + b1)x + (a0 + b0)
I coefficient-wise additions over F3: ri = (ai + bi ) mod 3I addition over F3: small look-up tables
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (38 / 91)
Addition, subtraction and accumulation over F3m
I sign selection: multiplication by 1 or 2−a ≡ 2a (mod 3)
I feedback loop for accumulation
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (39 / 91)
Multiplication over F3m
Parallel-serial multiplicationI multiplicand loaded in a parallel registerI multiplier loaded in a shift register
Most significant coefficients first (Horner scheme)
D coefficients processed at each clock cycle:⌈m
D
⌉cycles per
multiplication
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (40 / 91)
Multiplication over F3m
Example for D = 3 (3 coefficients per iteration):
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (41 / 91)
Multiplication over F3m
Computing the partial products bj · a:I coefficient-wise multiplication over F3: (bj · ai ) mod 3I multiplications over F3: small look-up tables
Multiplication by x j : simple shift (only wires)
Modulo f reduction:I f = xm + fm−1x
m−1 + · · ·+ f1x + f0 givesxm ≡ (−fm−1)x
m−1 + · · ·+ (−f1)x + (−f0) (mod f )I highest degree of polynomial to reduce: m + D − 1I if f is carefully selected (e.g. a trinomial or pentanomial),
only a few multiplications and additions over F3
I example for m = 97: f = x97 + x12 + 2
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (42 / 91)
Multiplication over F3m
Computing the partial products bj · a:I coefficient-wise multiplication over F3: (bj · ai ) mod 3I multiplications over F3: small look-up tables
Multiplication by x j : simple shift (only wires)
Modulo f reduction:I f = xm + fm−1x
m−1 + · · ·+ f1x + f0 givesxm ≡ (−fm−1)x
m−1 + · · ·+ (−f1)x + (−f0) (mod f )I highest degree of polynomial to reduce: m + D − 1I if f is carefully selected (e.g. a trinomial or pentanomial),
only a few multiplications and additions over F3
I example for m = 97: f = x97 + x12 + 2
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (42 / 91)
Multiplication over F3m
Computing the partial products bj · a:I coefficient-wise multiplication over F3: (bj · ai ) mod 3I multiplications over F3: small look-up tables
Multiplication by x j : simple shift (only wires)
Modulo f reduction:I f = xm + fm−1x
m−1 + · · ·+ f1x + f0 givesxm ≡ (−fm−1)x
m−1 + · · ·+ (−f1)x + (−f0) (mod f )I highest degree of polynomial to reduce: m + D − 1I if f is carefully selected (e.g. a trinomial or pentanomial),
only a few multiplications and additions over F3
I example for m = 97: f = x97 + x12 + 2
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (42 / 91)
Multiplication over F3m
Computing the partial products bj · a:I coefficient-wise multiplication over F3: (bj · ai ) mod 3I multiplications over F3: small look-up tables
Multiplication by x j : simple shift (only wires)
Modulo f reduction:I f = xm + fm−1x
m−1 + · · ·+ f1x + f0 givesxm ≡ (−fm−1)x
m−1 + · · ·+ (−f1)x + (−f0) (mod f )I highest degree of polynomial to reduce: m + D − 1I if f is carefully selected (e.g. a trinomial or pentanomial),
only a few multiplications and additions over F3
I example for m = 97: f = x97 + x12 + 2
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (42 / 91)
Multiplication over F3m
Example for D = 3 (3 coefficients per iteration):
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (43 / 91)
Frobenius map over F3m: cubingLet A be an arbitrary element of the field F3m , that in canonical basiscan be written as, A =
∑m−1i=0 aix
i , ai ∈ F3. Then, the polynomialcubing A3, can be computed as,
A3 =
(m−1∑i=0
aixi
)3
=m−1∑i=0
aix3i
=u∑
i=0
aix3i +
2u+r−1∑i=u+1
aix3i +
3u+r−1∑i=2u+r
aix3i
= C0 + x3u+rC1 + x6u+2rC2 = C0 + xmC1 + x2mC2.
Symbolic computation of the reduction:each coefficient of the result is a linear combination of the ai ’s
a3 mod f =n−1∑j=0
wj · µj
with wj ∈ F3, µj ∈ F3m , and µj ,i ∈ {0} ∪ {am−1, ... , a1, a0}Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (44 / 91)
Frobenius map over F3m
Example for m = 97 and f = x97 + x12 + 2:
a3 mod f = (a32x96 + a64x
95 + a96x94 + · · · + a33x
2 + a65x + a0 )× 1+ ( 0 + 0 + a88x
94 + · · · + 0 + 0 + a89)× 1+ ( 0 + 0 + a92x
94 + · · · + 0 + 0 + a93)× 1+ ( 0 + a60x
95 + 0 + · · · + 0 + a61x + 0 )× 2
= (a32x96 + a64x
95 + a96x94 + · · · + a33x
2 + a65x + a0 )× 1+ ( 0 + a60x
95 + a88x94 + · · · + 0 + a61x + a89)× 1
+ ( 0 + a60x95 + a92x
94 + · · · + 0 + a61x + a93)× 1
Required hardware:I only wires to compute the µj ’sI multiplications over F3 for the weights wj
I multi-operand addition over F3m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (45 / 91)
Frobenius map over F3m
Example for m = 97 and f = x97 + x12 + 2:
a3 mod f = (a32x96 + a64x
95 + a96x94 + · · · + a33x
2 + a65x + a0 )× 1+ ( 0 + 0 + a88x
94 + · · · + 0 + 0 + a89)× 1+ ( 0 + 0 + a92x
94 + · · · + 0 + 0 + a93)× 1+ ( 0 + a60x
95 + 0 + · · · + 0 + a61x + 0 )× 2
= (a32x96 + a64x
95 + a96x94 + · · · + a33x
2 + a65x + a0 )× 1+ ( 0 + a60x
95 + a88x94 + · · · + 0 + a61x + a89)× 1
+ ( 0 + a60x95 + a92x
94 + · · · + 0 + a61x + a93)× 1
Required hardware:I only wires to compute the µj ’sI multiplications over F3 for the weights wj
I multi-operand addition over F3m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (45 / 91)
Frobenius map over F3m
Example for m = 97 and f = x97 + x12 + 2:
a3 mod f = (a32x96 + a64x
95 + a96x94 + · · · + a33x
2 + a65x + a0 )× 1+ ( 0 + 0 + a88x
94 + · · · + 0 + 0 + a89)× 1+ ( 0 + 0 + a92x
94 + · · · + 0 + 0 + a93)× 1+ ( 0 + a60x
95 + 0 + · · · + 0 + a61x + 0 )× 2
= (a32x96 + a64x
95 + a96x94 + · · · + a33x
2 + a65x + a0 )× 1+ ( 0 + a60x
95 + a88x94 + · · · + 0 + a61x + a89)× 1
+ ( 0 + a60x95 + a92x
94 + · · · + 0 + a61x + a93)× 1
Required hardware:I only wires to compute the µj ’sI multiplications over F3 for the weights wj
I multi-operand addition over F3m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (45 / 91)
Inverse Frobenius map over F3m
Let A be an arbitrary element of the field F3m , that in canonical basis canbe written as,
A =m−1∑i=0
aixi =
u∑i=0
a3ix3i + x ·
u+r−2∑i=0
a3i+1x3i + x2 ·
u+r−2∑i=0
a3i+2x3i .
Then, the cube root 3√
A, can be computed as [barreto04],
3√
A =u∑
i=0
a3ixi + x1/3 ·
u+r−2∑i=0
a3i+1xi + x2/3 ·
u+r−2∑i=0
a3i+2xi . (2)
One can compute a cube root by finding the per-field constants x1/3 andx2/3.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (46 / 91)
Irreducible Trinomials P(x) = xm − xk + 1, withm ≡ k ≡ r mod 3
Let us consider the ternary field F3m generated by the trinomialP(x) = xm − xk + 1, irreducible over F3. Let us also assume that theextension degree m can be expressed as, m = 3u + r , u ≥ 1 andk = 3v + r , 0 ≤ v ≤ u, with m ≡ k ≡ r mod 3, r 6= 0 and u − 2v ≥ 1.In [barreto04] it was found that for r = 1 we have,
x2/3 = −xu+1 + xv+1; x1/3 = x2u+1 + xu+v+1 + x2v+1.
whereas for r = 2 we have,
x1/3 = −xu+1 + xv+1; x2/3 = x2u+2 + xu+v+2 + x2v+2.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (47 / 91)
Frobenius map over F3m
I feedback loop for successive cubingsI sign selection for computing either a3 or −a3
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (48 / 91)
Inversion over F3m
Extended Euclidean Algorithm?
I fast computationI ... but need for additional hardware
Our solution: Fermat’s little theorem
a−1 = a3m−2 on F3m (a 6= 0)
I algorithm by Itoh and TsujiiI requires only multiplications and cubings over F3m
I only one inversion for the full pairing: delay overhead is negligible(< 1%)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (49 / 91)
Inversion over F3m
Extended Euclidean Algorithm?I fast computationI ... but need for additional hardware
Our solution: Fermat’s little theorem
a−1 = a3m−2 on F3m (a 6= 0)
I algorithm by Itoh and TsujiiI requires only multiplications and cubings over F3m
I only one inversion for the full pairing: delay overhead is negligible(< 1%)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (49 / 91)
Inversion over F3m
Extended Euclidean Algorithm?I fast computationI ... but need for additional hardware
Our solution: Fermat’s little theorem
a−1 = a3m−2 on F3m (a 6= 0)
I algorithm by Itoh and TsujiiI requires only multiplications and cubings over F3m
I only one inversion for the full pairing: delay overhead is negligible(< 1%)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (49 / 91)
Inversion over F3m
Extended Euclidean Algorithm?I fast computationI ... but need for additional hardware
Our solution: Fermat’s little theorem
a−1 = a3m−2 on F3m (a 6= 0)
I algorithm by Itoh and TsujiiI requires only multiplications and cubings over F3m
I only one inversion for the full pairing: delay overhead is negligible(< 1%)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (49 / 91)
Inversion over F3m
Extended Euclidean Algorithm?I fast computationI ... but need for additional hardware
Our solution: Fermat’s little theorem
a−1 = a3m−2 on F3m (a 6= 0)
I algorithm by Itoh and TsujiiI requires only multiplications and cubings over F3m
I only one inversion for the full pairing: delay overhead is negligible(< 1%)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (49 / 91)
The full processing element
For the Tate pairing:limited parallelism between additions, multiplications and FrobeniusmapsCan we share hardware resources between the three operators?
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (50 / 91)
What can we share?
Input and output registers
Partial product generators:I sign selection for the addition / subtractionI partial products for the multiplicationI multiplication by the wj ’s for the Frobenius map
Multi-operand addition tree
Feedback loops for accumulation
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (51 / 91)
Our unified operator
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (52 / 91)
Field Towering F36m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (53 / 91)
Field Towering F36m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (54 / 91)
Field Towering F36m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (55 / 91)
Field Towering F36m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (56 / 91)
Computation of the Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :Characteristic 2 Characteristic 3
Base field (Fpm) F2m F2313 F3m F3127
+/− 27bm2 c+ 75 4287 119bm4 c+ 260 3949× 7bm2 c+ 29 1121 25bm4 c+ 93 868ap 6m + 9 1887 17bm2 c+ 8 1079a−1 1 1 1 1
Software not well suited to small characteristic: need for hardwareacceleration
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (57 / 91)
Computation of the Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :Characteristic 2 Characteristic 3
Base field (Fpm) F2m F2313 F3m F3127
+/− 27bm2 c+ 75 4287 119bm4 c+ 260 3949× 7bm2 c+ 29 1121 25bm4 c+ 93 868ap 6m + 9 1887 17bm2 c+ 8 1079a−1 1 1 1 1
Software not well suited to small characteristic: need for hardwareacceleration
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (57 / 91)
Computation of the Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :Characteristic 2 Characteristic 3
Base field (Fpm) F2m F2313 F3m F3127
+/− 27bm2 c+ 75 4287 119bm4 c+ 260 3949× 7bm2 c+ 29 1121 25bm4 c+ 93 868ap 6m + 9 1887 17bm2 c+ 8 1079a−1 1 1 1 1
Software not well suited to small characteristic: need for hardwareacceleration
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (57 / 91)
Computation of the Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :Characteristic 2 Characteristic 3
Base field (Fpm) F2m F2313 F3m F3127
+/− 27bm2 c+ 75 4287 119bm4 c+ 260 3949× 7bm2 c+ 29 1121 25bm4 c+ 93 868ap 6m + 9 1887 17bm2 c+ 8 1079a−1 1 1 1 1
Software not well suited to small characteristic: need for hardwareacceleration
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (57 / 91)
Computation of the Tate pairinge : E (Fpm)[`]× E (Fpm)[`]→ µ` ⊆ F×pkm
Arithmetic over Fpm :I polynomial basis: Fpm ∼= Fp[x ]/(f (x))I f (x), degree-m polynomial irreducible over Fp
Arithmetic over F×pkm :
I tower-field representationI only arithmetic over the underlying field Fpm
Operations over Fpm :Characteristic 2 Characteristic 3
Base field (Fpm) F2m F2313 F3m F3127
+/− 27bm2 c+ 75 4287 119bm4 c+ 260 3949× 7bm2 c+ 29 1121 25bm4 c+ 93 868ap 6m + 9 1887 17bm2 c+ 8 1079a−1 1 1 1 1
Software not well suited to small characteristic: need for hardwareacceleration
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (57 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (58 / 91)
The best area-time product of the literature...
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (59 / 91)
... But still quite slow(or not the fastest, at least!)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (60 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (61 / 91)
Weil Descent-based attacks
We now consider:
E (F3m·n)[`] with m prime and n small
Weil descent (or Weil restriction to scalar) apply:
E (F3m·n) ∼= WE (F3m)
Gaudry–Hess–Smart attack:I WE (F3m) might map to Jac(C), with C a curve of genus at least nI index calculus algorithm: solve DLP in O(3m)2−
2n
Static Diffie–Hellman problemI leakage when reusing private key (e.g. ElGamal encryption)I Granger’s attack: complexity in O(3m)1−
1n+1
I revoke key after a certain amount of use is an effective workaround
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (62 / 91)
Weil Descent-based attacks
We now consider:
E (F3m·n)[`] with m prime and n small
Weil descent (or Weil restriction to scalar) apply:
E (F3m·n) ∼= WE (F3m)
Gaudry–Hess–Smart attack:I WE (F3m) might map to Jac(C), with C a curve of genus at least nI index calculus algorithm: solve DLP in O(3m)2−
2n
Static Diffie–Hellman problemI leakage when reusing private key (e.g. ElGamal encryption)I Granger’s attack: complexity in O(3m)1−
1n+1
I revoke key after a certain amount of use is an effective workaround
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (62 / 91)
Weil Descent-based attacks
We now consider:
E (F3m·n)[`] with m prime and n small
Weil descent (or Weil restriction to scalar) apply:
E (F3m·n) ∼= WE (F3m)
Gaudry–Hess–Smart attack:I WE (F3m) might map to Jac(C), with C a curve of genus at least nI index calculus algorithm: solve DLP in O(3m)2−
2n
Static Diffie–Hellman problemI leakage when reusing private key (e.g. ElGamal encryption)I Granger’s attack: complexity in O(3m)1−
1n+1
I revoke key after a certain amount of use is an effective workaround
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (62 / 91)
Suitable curves for 128-bit security level
Cost of the attacks (bits)
pm n log2 ` Pollard’s ρ FFS
GHS SDH
3503 1 697 342 132
– –
397 5 338 163 130
245 128
367 7 612 300 129
182 92
353 11 672 330 140
152 77
343 13 764 376 138
125 63
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (63 / 91)
Suitable curves for 128-bit security level
Cost of the attacks (bits)
pm n log2 ` Pollard’s ρ FFS GHS SDH
3503 1 697 342 132 – –
397 5 338 163 130 245 128
367 7 612 300 129 182 92
353 11 672 330 140 152 77
343 13 764 376 138 125 63
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (63 / 91)
Suitable curves for 128-bit security level
Cost of the attacks (bits)
pm n log2 ` Pollard’s ρ FFS GHS SDH
3503 1 697 342 132 – –
397 5 338 163 130 245 128
367 7 612 300 129 182 92
353 11 672 330 140 152 77
343 13 764 376 138 125 63
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (63 / 91)
Experimental setup
Full Tate pairing computation over E (F397·5)
× + (.)3
F397 37289 253314 21099
Finite field coprocessorI Prototyped on Xilinx Virtex-4 LX FPGAsI Post-place-and-route timing and area estimations
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (64 / 91)
Experimental setup
Full Tate pairing computation over E (F397·5)
× + (.)3
F397 37289 253314 21099
Finite field coprocessorI Prototyped on Xilinx Virtex-4 LX FPGAsI Post-place-and-route timing and area estimations
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (64 / 91)
Calculation time
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (65 / 91)
Motivations
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (66 / 91)
Motivations
High speed is more important than low resources for somecryptographic applications
Explore the other end of the area vs. time tradeoff:I faster but larger than the unified operatorI what about the area-time product?
Accelerate the computation by extracting as much parallelism aspossible...
... Without increasing dramatically the resource requirements
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (67 / 91)
Motivations
High speed is more important than low resources for somecryptographic applications
Explore the other end of the area vs. time tradeoff:I faster but larger than the unified operatorI what about the area-time product?
Accelerate the computation by extracting as much parallelism aspossible...
... Without increasing dramatically the resource requirements
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (67 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q)
= ηT (P,Q)M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements ⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q) = ηT (P,Q)
M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements ⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q) = ηT (P,Q)M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements ⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q) = ηT (P,Q)M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements ⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q) = ηT (P,Q)M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements ⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q) = ηT (P,Q)M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements
⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Computation of the ηT pairing
The Tate pairing over E (Fpm) is computed in two main steps
e(P,Q) = ηT (P,Q)M
Computation of the ηT pairingI via Miller’s algorithm: loop of (m + 1)/2 iterationsI result only defined modulo N-th powers in F×
pkm , with N = #E (Fpm)
Final exponentiation by M = (pkm − 1)/NI required to obtain a unique value for each congruence classI example in characteristic 3 (k = 6 and N = 3m + 1± 3(m+1)/2):
M =36m − 1
3m + 1± 3(m+1)/2=(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)I exploit the special form of the exponent: ad-hoc algorithm
Two distinct computational requirements ⇒ use two distinctcoprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (68 / 91)
Reduced Tate pairing
Reduced Tate pairing
Reduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative (irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Reduced Tate pairing
Reduced Tate pairingReduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative (irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Reduced Tate pairing
Reduced Tate pairingReduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative (irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Reduced Tate pairing
Reduced Tate pairingReduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative (irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Reduced Tate pairing
Reduced Tate pairingReduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative
(irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Reduced Tate pairing
Reduced Tate pairingReduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative (irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Reduced Tate pairing
Reduced Tate pairingReduced Tate pairing
µ` ⊆ F×36m
F×36m
algorithm)
Non-reducedpairing
(iterative (irregular
exponentiation
computation)
Final
E (F3m)[`]
E (F3m)[`]
µ` ⊆ F×36m
Non-reducedpairing
(iterativecomputation)
(irregular
exponentiationFinal
algorithm)
Input: two points P and Q in E (F3m)[`]
Output: an `-th root of unity in the extension F×36m
Two very different steps
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (69 / 91)
Two coprocessors for the ηT pairing
The two operations are purely sequential
Only one active coprocessor at every moment
Pipeline the data between the two coprocessorsI both of them are kept busyI higher throughput
Balance the computation time between the two coprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (70 / 91)
Two coprocessors for the ηT pairing
The two operations are purely sequential
Only one active coprocessor at every moment
Pipeline the data between the two coprocessors
I both of them are kept busyI higher throughput
Balance the computation time between the two coprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (70 / 91)
Two coprocessors for the ηT pairing
The two operations are purely sequential
Only one active coprocessor at every moment
Pipeline the data between the two coprocessorsI both of them are kept busyI higher throughput
Balance the computation time between the two coprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (70 / 91)
Two coprocessors for the ηT pairing
The two operations are purely sequential
Only one active coprocessor at every moment
Pipeline the data between the two coprocessorsI both of them are kept busyI higher throughput
Balance the computation time between the two coprocessors
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (70 / 91)
ηT pairing algorithm
ηT : E (F3m)[`]× E (F3m)[`]→ F×36m
Three tasks per iteration:À update the coordinatesÁ compute the line equation accumulate the new factor
Total cost: 17×, 4 Frobenius/inverse Frobenius and 30 + over F3m
Cost of the inverse Frobenius: Same as the Frobenius
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (71 / 91)
ηT pairing algorithm
ηT : E (F3m)[`]× E (F3m)[`]→ F×36m
Three tasks per iteration:À update the coordinatesÁ compute the line equation accumulate the new factor
Total cost: 17×, 4 Frobenius/inverse Frobenius and 30 + over F3m
Cost of the inverse Frobenius: Same as the Frobenius
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (71 / 91)
Accelerating the ηT pairingTotal cost: 17×, 2 Frobenius and inverse Frobenius and 30 + overF3m per iteration
I Frobenius/inverse Frobenius and +: cheap and fast operations
I critical operation: ×Need for a fast parallel multiplier: Karatsuba
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (72 / 91)
Accelerating the ηT pairingTotal cost: 17×, 2 Frobenius and inverse Frobenius and 30 + overF3m per iteration
I Frobenius/inverse Frobenius and +: cheap and fast operationsI critical operation: ×
Need for a fast parallel multiplier: Karatsuba
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (72 / 91)
Accelerating the ηT pairingTotal cost: 17×, 2 Frobenius and inverse Frobenius and 30 + overF3m per iteration
I Frobenius/inverse Frobenius and +: cheap and fast operationsI critical operation: ×
Need for a fast parallel multiplier: Karatsuba
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (72 / 91)
A 128-bit three-stage pipelined Karatsuba multiplierarchitecture
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (73 / 91)
A parallel Karatsuba multiplier
I fully parallel: all sub-products are computed in parallelI pipelined architecture: higher clock frequency, one product per cycle
I sub-products recursively implemented as Karatsuba-Ofman multipliersI support for other variants: odd-even split, 3-way split, ...I final reduction modulo the irreducible polynomial f
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (74 / 91)
A parallel Karatsuba multiplier
I fully parallel: all sub-products are computed in parallelI pipelined architecture: higher clock frequency, one product per cycleI sub-products recursively implemented as Karatsuba-Ofman multipliers
I support for other variants: odd-even split, 3-way split, ...I final reduction modulo the irreducible polynomial f
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (74 / 91)
A parallel Karatsuba multiplier
I fully parallel: all sub-products are computed in parallelI pipelined architecture: higher clock frequency, one product per cycleI sub-products recursively implemented as Karatsuba-Ofman multipliersI support for other variants: odd-even split, 3-way split, ...
I final reduction modulo the irreducible polynomial f
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (74 / 91)
A parallel Karatsuba multiplier
I fully parallel: all sub-products are computed in parallelI pipelined architecture: higher clock frequency, one product per cycleI sub-products recursively implemented as Karatsuba-Ofman multipliersI support for other variants: odd-even split, 3-way split, ...I final reduction modulo the irreducible polynomial fFrancisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (74 / 91)
Accelerating the ηT pairing
ηT coprocessor based on a single large multiplier:I parallel Karatsuba architectureI 7-stage pipelineI one product per cycle
Challenge: keep the multiplier busy at all times
Careful scheduling to avoid pipeline bubbles (idle cycles):I ensure that multiplication operands are always availableI avoid memory congestion issues
We managed to accomplish that: our processor computes Miller loopin just 17 · (m +3)/2 clock cycles (considering the initialization phase)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (75 / 91)
Accelerating the ηT pairing
ηT coprocessor based on a single large multiplier:I parallel Karatsuba architectureI 7-stage pipelineI one product per cycle
Challenge: keep the multiplier busy at all times
Careful scheduling to avoid pipeline bubbles (idle cycles):I ensure that multiplication operands are always availableI avoid memory congestion issues
We managed to accomplish that: our processor computes Miller loopin just 17 · (m +3)/2 clock cycles (considering the initialization phase)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (75 / 91)
Accelerating the ηT pairing
ηT coprocessor based on a single large multiplier:I parallel Karatsuba architectureI 7-stage pipelineI one product per cycle
Challenge: keep the multiplier busy at all times
Careful scheduling to avoid pipeline bubbles (idle cycles):I ensure that multiplication operands are always availableI avoid memory congestion issues
We managed to accomplish that: our processor computes Miller loopin just 17 · (m +3)/2 clock cycles (considering the initialization phase)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (75 / 91)
Modified Algorithm
ηT : E (F3m)[`]× E (F3m)[`]→ F×36m
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (76 / 91)
Modified Algorithm
ηT : E (F3m)[`]× E (F3m)[`]→ F×36m
Modified algorithm: 17×, 2 Frobenius, 2 inverse Frobenius and 30 +over F3m
Previous algorithm: 17×, 10 Frobenius and 38 +Cost of the inverse Frobenius: Same as the Frobenius
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (77 / 91)
A parallel operator for the ηT pairing
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (78 / 91)
The final exponentiation
Compute e(P,Q) as ηT (P,Q)M with ηT (P,Q) ∈ F×36m and
M =(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)
Operations over F3m : 73×, 3m + 3 Frobenius, 3m + 175 +, and 1inversion (∼ log m× and m − 1 Frobenius)
Cost of the ηT pairing:I (m + 1)/2 iterationsI 17×, 10 Frobenius and 30 + over F3m per iteration
The final exponentiation is much cheaper than the ηT pairing
Challenge for the final exponentiation:I computation in the same time as the ηT pairingI ... using as few resources as possible
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (79 / 91)
The final exponentiation
Compute e(P,Q) as ηT (P,Q)M with ηT (P,Q) ∈ F×36m and
M =(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)
Operations over F3m : 73×, 3m + 3 Frobenius, 3m + 175 +, and 1inversion
(∼ log m× and m − 1 Frobenius)
Cost of the ηT pairing:I (m + 1)/2 iterationsI 17×, 10 Frobenius and 30 + over F3m per iteration
The final exponentiation is much cheaper than the ηT pairing
Challenge for the final exponentiation:I computation in the same time as the ηT pairingI ... using as few resources as possible
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (79 / 91)
The final exponentiation
Compute e(P,Q) as ηT (P,Q)M with ηT (P,Q) ∈ F×36m and
M =(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)
Operations over F3m : 73×, 3m + 3 Frobenius, 3m + 175 +, and 1inversion (∼ log m× and m − 1 Frobenius)
Cost of the ηT pairing:I (m + 1)/2 iterationsI 17×, 10 Frobenius and 30 + over F3m per iteration
The final exponentiation is much cheaper than the ηT pairing
Challenge for the final exponentiation:I computation in the same time as the ηT pairingI ... using as few resources as possible
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (79 / 91)
The final exponentiation
Compute e(P,Q) as ηT (P,Q)M with ηT (P,Q) ∈ F×36m and
M =(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)
Operations over F3m : 73×, 3m + 3 Frobenius, 3m + 175 +, and 1inversion (∼ log m× and m − 1 Frobenius)
Cost of the ηT pairing:I (m + 1)/2 iterationsI 17×, 10 Frobenius and 30 + over F3m per iteration
The final exponentiation is much cheaper than the ηT pairing
Challenge for the final exponentiation:I computation in the same time as the ηT pairingI ... using as few resources as possible
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (79 / 91)
The final exponentiation
Compute e(P,Q) as ηT (P,Q)M with ηT (P,Q) ∈ F×36m and
M =(33m − 1
)(3m + 1)
(3m + 1∓ 3(m+1)/2
)
Operations over F3m : 73×, 3m + 3 Frobenius, 3m + 175 +, and 1inversion (∼ log m× and m − 1 Frobenius)
Cost of the ηT pairing:I (m + 1)/2 iterationsI 17×, 10 Frobenius and 30 + over F3m per iteration
The final exponentiation is much cheaper than the ηT pairing
Challenge for the final exponentiation:I computation in the same time as the ηT pairingI ... using as few resources as possible
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (79 / 91)
The final exponentiation
Design the smallest architecture possible supporting all the requiredoperations over F3m
purely sequential scheduling
Although some parallelism is required.
We found out that the usage of the inverse Frobenius operator isadvantageous for computing the final exponentiation (as long as theirreducible polynomials are inverse-Frobenius friendly)
New coprocessor with two arithmetic units:I a standalone multiplier, based on a parallel-serial schemeI a unified operator supporting addition/subtraction, inverse Frobenius
map and inverse double Frobenius map
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (80 / 91)
The final exponentiation
Design the smallest architecture possible supporting all the requiredoperations over F3m
purely sequential scheduling
Although some parallelism is required.
We found out that the usage of the inverse Frobenius operator isadvantageous for computing the final exponentiation (as long as theirreducible polynomials are inverse-Frobenius friendly)
New coprocessor with two arithmetic units:I a standalone multiplier, based on a parallel-serial schemeI a unified operator supporting addition/subtraction, inverse Frobenius
map and inverse double Frobenius map
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (80 / 91)
The final exponentiation
Design the smallest architecture possible supporting all the requiredoperations over F3m
purely sequential scheduling
Although some parallelism is required.
We found out that the usage of the inverse Frobenius operator isadvantageous for computing the final exponentiation (as long as theirreducible polynomials are inverse-Frobenius friendly)
New coprocessor with two arithmetic units:I a standalone multiplier, based on a parallel-serial schemeI a unified operator supporting addition/subtraction, inverse Frobenius
map and inverse double Frobenius map
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (80 / 91)
The final exponentiation
Design the smallest architecture possible supporting all the requiredoperations over F3m
purely sequential scheduling
Although some parallelism is required.
We found out that the usage of the inverse Frobenius operator isadvantageous for computing the final exponentiation (as long as theirreducible polynomials are inverse-Frobenius friendly)
New coprocessor with two arithmetic units:I a standalone multiplier, based on a parallel-serial schemeI a unified operator supporting addition/subtraction, inverse Frobenius
map and inverse double Frobenius map
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (80 / 91)
A coprocessor for the final exponentiation
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (81 / 91)
Agenda1 Context and motivation
bilinear pairings defined over elliptic curves: Basic definitionsBut.... Why should one bother implementing pairings in Hardware?A quick overview of reconfigurable hardware devices
2 Computing the Tate PairingThe Tate Pairing over Supersingular elliptic curvesThe Tate Pairing over ordinary elliptic curves
3 Case of Study #1: A compact implementation of the ηT pairingComputing the reduced Tate pairingArithmetic over F3m
Results Obtained
4 Case of Study #2: Estibals’ composite ηT pairingAttacks
5 Case of Study #3: A fast implementation of the ηT pairingImplementation Results in Hardware
6 Wish list on hardware implementation of pairings (Some concrete openproblems)
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (82 / 91)
Hardware accelerators
10
100
1000
60 65 70 75 80 85 90 95 100 105 110
Security [bits]
Calculation time [µs]
Virtex-IIPro
Virtex-4 LX
6.2 µs / F397
12.8 µs / F3193
16.9 µs / F3313
20.9 µs / F397
100.8 µs / F2457
675.5 µs / F2557
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (83 / 91)
Hardware implementation notes
Our Xilinx FPGA implementation, significantly improved thecomputation time of all the hardware pairing coprocessors forsupersingular curves previously published
(a bit Surprisingly) our architecture also enjoys the best area/timetrade-off performance among supersingular pairing accelerators
However, because we exceeded the FPGA’s capacity, we could onlyachieve up to 109 bits of security
Although it was not discussed here, we also implemented the Tatepairing over char 2. Experimentally, we observed that our char 2 andchar 3 accelerators achieve almost the same time performance
In the design process of our char 2 accelerator we found the followingundocumented family of square-root friendly irreducible pentanomials:f (x) = xm + xm−d + xm−2d + xd + 1.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (84 / 91)
Hardware implementation notes
Our Xilinx FPGA implementation, significantly improved thecomputation time of all the hardware pairing coprocessors forsupersingular curves previously published
(a bit Surprisingly) our architecture also enjoys the best area/timetrade-off performance among supersingular pairing accelerators
However, because we exceeded the FPGA’s capacity, we could onlyachieve up to 109 bits of security
Although it was not discussed here, we also implemented the Tatepairing over char 2. Experimentally, we observed that our char 2 andchar 3 accelerators achieve almost the same time performance
In the design process of our char 2 accelerator we found the followingundocumented family of square-root friendly irreducible pentanomials:f (x) = xm + xm−d + xm−2d + xd + 1.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (84 / 91)
Hardware implementation notes
Our Xilinx FPGA implementation, significantly improved thecomputation time of all the hardware pairing coprocessors forsupersingular curves previously published
(a bit Surprisingly) our architecture also enjoys the best area/timetrade-off performance among supersingular pairing accelerators
However, because we exceeded the FPGA’s capacity, we could onlyachieve up to 109 bits of security
Although it was not discussed here, we also implemented the Tatepairing over char 2. Experimentally, we observed that our char 2 andchar 3 accelerators achieve almost the same time performance
In the design process of our char 2 accelerator we found the followingundocumented family of square-root friendly irreducible pentanomials:f (x) = xm + xm−d + xm−2d + xd + 1.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (84 / 91)
Hardware implementation notes
Our Xilinx FPGA implementation, significantly improved thecomputation time of all the hardware pairing coprocessors forsupersingular curves previously published
(a bit Surprisingly) our architecture also enjoys the best area/timetrade-off performance among supersingular pairing accelerators
However, because we exceeded the FPGA’s capacity, we could onlyachieve up to 109 bits of security
Although it was not discussed here, we also implemented the Tatepairing over char 2. Experimentally, we observed that our char 2 andchar 3 accelerators achieve almost the same time performance
In the design process of our char 2 accelerator we found the followingundocumented family of square-root friendly irreducible pentanomials:f (x) = xm + xm−d + xm−2d + xd + 1.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (84 / 91)
Hardware implementation notes
Our Xilinx FPGA implementation, significantly improved thecomputation time of all the hardware pairing coprocessors forsupersingular curves previously published
(a bit Surprisingly) our architecture also enjoys the best area/timetrade-off performance among supersingular pairing accelerators
However, because we exceeded the FPGA’s capacity, we could onlyachieve up to 109 bits of security
Although it was not discussed here, we also implemented the Tatepairing over char 2. Experimentally, we observed that our char 2 andchar 3 accelerators achieve almost the same time performance
In the design process of our char 2 accelerator we found the followingundocumented family of square-root friendly irreducible pentanomials:f (x) = xm + xm−d + xm−2d + xd + 1.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (84 / 91)
Hardware implementation of pairings: comparison Table
Design bits platform alg. area freq cycles delay[MHz] 103 [ms]
Cheung’11 126 Xilinx RNS 7032 slices 250 143.1 0.57Virtex 6 par. 32 DSPs
Fan’11 ate 128 Xilinx HMM 4014 slices 210 245.4 1.17128 Virtex 6 par. 42 DSPs
Estibals’10 Tate F35·97 Xilinx ternary 4755 slices 192 428.9 2.23128 Virtex 4 field 7 BRAMs
Aranha’10 ate F2367 Xilinx binary 4518 slices 220 774.0 3.52128 Virtex 4 field
Beuchat’11 ηT F2691 Xilinx binary 78874 slices 130 2.44 0.02105 Virtex 4 field
Ghosh’11 ηT F21223 Xilinx binary 15167 slices 250 76.0 0.19128 Virtex 6 field
Beuchat’10 ate core i7 Montg. - 2800 2330 0.83126 multi-core -
Aranha’11 ate Phenom Montg. - 3000 1562 0.52126 II -
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (85 / 91)
Some concrete open problems
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (86 / 91)
Some concrete open problems
0 To design a 128-bit security BN pairing hardware accelerator fasterthan the fastest software implementationIdea: Try to revisit the classical integer Montgomery multiplication?
1 To design a 128-bit security ηT pairing accelerator faster and withbetter area-time tradeoff than the one reported in[Ghosh-Chowdhury-Das CHES’11]
2 Have a look on what’s going on with higher genus[see for example the optimal pairing over supersingular genus-2 binaryhyperelliptic curves eprint 2010/559]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (87 / 91)
Some concrete open problems
0 To design a 128-bit security BN pairing hardware accelerator fasterthan the fastest software implementationIdea: Try to revisit the classical integer Montgomery multiplication?
1 To design a 128-bit security ηT pairing accelerator faster and withbetter area-time tradeoff than the one reported in[Ghosh-Chowdhury-Das CHES’11]
2 Have a look on what’s going on with higher genus[see for example the optimal pairing over supersingular genus-2 binaryhyperelliptic curves eprint 2010/559]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (87 / 91)
Some concrete open problems
0 To design a 128-bit security BN pairing hardware accelerator fasterthan the fastest software implementationIdea: Try to revisit the classical integer Montgomery multiplication?
1 To design a 128-bit security ηT pairing accelerator faster and withbetter area-time tradeoff than the one reported in[Ghosh-Chowdhury-Das CHES’11]
2 Have a look on what’s going on with higher genus[see for example the optimal pairing over supersingular genus-2 binaryhyperelliptic curves eprint 2010/559]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (87 / 91)
Some concrete open problems
3 Side channel attacks on pairings.Little has been done on this topic, it appears that at least in somecases there is a lack of good security notions and confidential targets.For example, all the parameters involved in the verification primitiveof the BLS Short signature protocol are public:
e(D,Q) = e(S ,P),
where D = H(m) is the message digest, S = aD is its signature andP,Q = aP are the generator and public key of the signer,respectively. See for example [Page and Vercauteren eprint 2004/283]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (88 / 91)
Some concrete open problems
3 Side channel attacks on pairings.Little has been done on this topic, it appears that at least in somecases there is a lack of good security notions and confidential targets.For example, all the parameters involved in the verification primitiveof the BLS Short signature protocol are public:
e(D,Q) = e(S ,P),
where D = H(m) is the message digest, S = aD is its signature andP,Q = aP are the generator and public key of the signer,respectively. See for example [Page and Vercauteren eprint 2004/283]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (88 / 91)
Some concrete open problems
4 Better and more efficient algorithmsImprovements in the Miller computation, Better Final exponentiation,etc. For recent improvements see for example:
[Karabina eprint 2010/542], [Aranha-Karabina-Longa-Gebotys-Lopez
Eurocrypt 2011], [FuentesC-Knapp-RodrıguezH SAC 2011],
[Pereira-Simplıcio-Naehrig-Barreto JSS 2011]
5 Compute pairings within the context of protocolsBesides pairings there exist other relevant building blocks/primitivesfor pairing-based cryptography: hashing to G1 and G2, performingscalar multiplication in G1 and G2, etc. See for example:[Galbraith-Lin-Scott JoC 2011], [Scott eprint 2011/334],
[FuentesC-Knapp-RodrıguezH SAC 2011]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (89 / 91)
Some concrete open problems
4 Better and more efficient algorithmsImprovements in the Miller computation, Better Final exponentiation,etc. For recent improvements see for example:[Karabina eprint 2010/542], [Aranha-Karabina-Longa-Gebotys-Lopez
Eurocrypt 2011], [FuentesC-Knapp-RodrıguezH SAC 2011],
[Pereira-Simplıcio-Naehrig-Barreto JSS 2011]
5 Compute pairings within the context of protocolsBesides pairings there exist other relevant building blocks/primitivesfor pairing-based cryptography: hashing to G1 and G2, performingscalar multiplication in G1 and G2, etc. See for example:
[Galbraith-Lin-Scott JoC 2011], [Scott eprint 2011/334],
[FuentesC-Knapp-RodrıguezH SAC 2011]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (89 / 91)
Some concrete open problems
4 Better and more efficient algorithmsImprovements in the Miller computation, Better Final exponentiation,etc. For recent improvements see for example:[Karabina eprint 2010/542], [Aranha-Karabina-Longa-Gebotys-Lopez
Eurocrypt 2011], [FuentesC-Knapp-RodrıguezH SAC 2011],
[Pereira-Simplıcio-Naehrig-Barreto JSS 2011]
5 Compute pairings within the context of protocolsBesides pairings there exist other relevant building blocks/primitivesfor pairing-based cryptography: hashing to G1 and G2, performingscalar multiplication in G1 and G2, etc. See for example:[Galbraith-Lin-Scott JoC 2011], [Scott eprint 2011/334],
[FuentesC-Knapp-RodrıguezH SAC 2011]
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (89 / 91)
Some concrete open problems
6 Symmetric pairings Vs. Asymmetric pairings
Factors to be considered: overall efficiency in the protocols,side-channel resistance, role in the security assumptions of theprotocols.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (90 / 91)
Some concrete open problems
6 Symmetric pairings Vs. Asymmetric pairingsFactors to be considered: overall efficiency in the protocols,side-channel resistance, role in the security assumptions of theprotocols.
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (90 / 91)
Thank you for your attention
Questions?
Francisco Rodrıguez-Henrıquez Hardware Implementation of Pairings (91 / 91)