+ All Categories
Transcript
Page 1: Cranking Floating Point Performance Up To 11

Cranking Floating Point Performance Up To 11 

Noel LlopisSnappy Touch

http://twitter.com/[email protected]

http://gamesfromwithin.com

Page 2: Cranking Floating Point Performance Up To 11
Page 3: Cranking Floating Point Performance Up To 11
Page 4: Cranking Floating Point Performance Up To 11
Page 5: Cranking Floating Point Performance Up To 11
Page 6: Cranking Floating Point Performance Up To 11
Page 7: Cranking Floating Point Performance Up To 11

Floating Point Performance

Page 8: Cranking Floating Point Performance Up To 11

Floating point numbers

Page 9: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

Page 10: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

Page 11: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

• Following IEEE 754 format

Page 12: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

• Following IEEE 754 format

• Single precision: 32 bits

Page 13: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

• Following IEEE 754 format

• Single precision: 32 bits

• Double precision: 64 bits

Page 14: Cranking Floating Point Performance Up To 11

Floating point numbers

Page 15: Cranking Floating Point Performance Up To 11

Floating point numbers

Page 16: Cranking Floating Point Performance Up To 11

Why floating point performance?

Page 17: Cranking Floating Point Performance Up To 11

Why floating point performance?

• Most games use floating point numbers for most of their calculations

Page 18: Cranking Floating Point Performance Up To 11

Why floating point performance?

• Most games use floating point numbers for most of their calculations

• Positions, velocities, physics, etc, etc.

Page 19: Cranking Floating Point Performance Up To 11

Why floating point performance?

• Most games use floating point numbers for most of their calculations

• Positions, velocities, physics, etc, etc.

• Maybe not so much for regular apps

Page 20: Cranking Floating Point Performance Up To 11

CPU

Page 21: Cranking Floating Point Performance Up To 11

CPU

• 32-bit RISC ARM 11

Page 22: Cranking Floating Point Performance Up To 11

CPU

• 32-bit RISC ARM 11

• 400-535Mhz

Page 23: Cranking Floating Point Performance Up To 11

CPU

• 32-bit RISC ARM 11

• 400-535Mhz

• iPhone 2G/3G and iPod Touch 1st and 2nd gen

Page 24: Cranking Floating Point Performance Up To 11

CPU (iPhone 3GS)

Page 25: Cranking Floating Point Performance Up To 11

CPU (iPhone 3GS)

• Cortex-A8 600MHz

Page 26: Cranking Floating Point Performance Up To 11

CPU (iPhone 3GS)

• Cortex-A8 600MHz

• More advanced architecture

Page 27: Cranking Floating Point Performance Up To 11

CPU

Page 28: Cranking Floating Point Performance Up To 11

CPU

• No floating point support in the ARM CPU!!!

Page 29: Cranking Floating Point Performance Up To 11

How about integer math?

Page 30: Cranking Floating Point Performance Up To 11

How about integer math?

• No need to do any floating point operations

Page 31: Cranking Floating Point Performance Up To 11

How about integer math?

• No need to do any floating point operations

• Fully supported in the ARM processor

Page 32: Cranking Floating Point Performance Up To 11

How about integer math?

• No need to do any floating point operations

• Fully supported in the ARM processor

• But...

Page 33: Cranking Floating Point Performance Up To 11

Integer Divide

Page 34: Cranking Floating Point Performance Up To 11

Integer Divide

Page 35: Cranking Floating Point Performance Up To 11

Integer Divide

There is no integer divide

Page 36: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

Page 37: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

Page 38: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

Page 39: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

• Can use a fixed-point library.

Page 40: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

• Can use a fixed-point library.

• Performs rational arithmetic with integer values at a reduced range/resolution.

Page 41: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

• Can use a fixed-point library.

• Performs rational arithmetic with integer values at a reduced range/resolution.

• Not so great...

Page 42: Cranking Floating Point Performance Up To 11

Floating point support

Page 43: Cranking Floating Point Performance Up To 11

Floating point support

• There’s a floating point unit

Page 44: Cranking Floating Point Performance Up To 11

Floating point support

• There’s a floating point unit

• Compiled C/C++/ObjC code uses the VFP unit for any floating point operations.

Page 45: Cranking Floating Point Performance Up To 11

Sample program

Page 46: Cranking Floating Point Performance Up To 11

Sample program struct Particle { float x, y, z; float vx, vy, vz; };

Page 47: Cranking Floating Point Performance Up To 11

Sample program struct Particle { float x, y, z; float vx, vy, vz; };

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Page 48: Cranking Floating Point Performance Up To 11

Sample program struct Particle { float x, y, z; float vx, vy, vz; };

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

• 7.2 seconds on an iPod Touch 2nd gen

Page 49: Cranking Floating Point Performance Up To 11

Floating point support

Page 50: Cranking Floating Point Performance Up To 11

Floating point support

Trust no one!

Page 51: Cranking Floating Point Performance Up To 11

Floating point support

Trust no one!When in doubt, check the

assembly generated

Page 52: Cranking Floating Point Performance Up To 11

Floating point support

Page 53: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 54: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 55: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

Page 56: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

• Less memory, maybe better performance.

Page 57: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

• Less memory, maybe better performance.

• No floating point support.

Page 58: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

• Less memory, maybe better performance.

• No floating point support.

• Every time there’s an fp operation, it switches out of Thumb, does the fp operation, and switches back on.

Page 59: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 60: Cranking Floating Point Performance Up To 11

Thumb Mode

• It’s on by default!

Page 61: Cranking Floating Point Performance Up To 11

Thumb Mode

• It’s on by default!

• Potentially HUGE wins turning it off.

Page 62: Cranking Floating Point Performance Up To 11

Thumb Mode

• It’s on by default!

• Potentially HUGE wins turning it off.

Page 63: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 64: Cranking Floating Point Performance Up To 11

Thumb Mode

• Turning off Thumb mode increased performance in Flower Garden by over 2x

Page 65: Cranking Floating Point Performance Up To 11

Thumb Mode

• Turning off Thumb mode increased performance in Flower Garden by over 2x

• Heavy usage of floating point operations though

Page 66: Cranking Floating Point Performance Up To 11

Thumb Mode

• Turning off Thumb mode increased performance in Flower Garden by over 2x

• Heavy usage of floating point operations though

• Most games will probably benefit from turning it off (especially 3D games)

Page 67: Cranking Floating Point Performance Up To 11
Page 68: Cranking Floating Point Performance Up To 11

2.6 seconds!

Page 69: Cranking Floating Point Performance Up To 11

ARM assemblyDISCLAIMER:

Page 70: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Page 71: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Page 72: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Page 73: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Z80!!!

Page 74: Cranking Floating Point Performance Up To 11

ARM assembly

Page 75: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

Page 76: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

• References included in your USB card

Page 77: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

• References included in your USB card

• Or download them from the ARM site

Page 78: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

• References included in your USB card

• Or download them from the ARM site

• http://bit.ly/arminfo

Page 79: Cranking Floating Point Performance Up To 11

ARM assembly

Page 80: Cranking Floating Point Performance Up To 11

ARM assembly

• Reading assembly is a very important skill for high-performance programming

Page 81: Cranking Floating Point Performance Up To 11

ARM assembly

• Reading assembly is a very important skill for high-performance programming

• Writing is more specialized. Most people don’t need to.

Page 82: Cranking Floating Point Performance Up To 11

VFP unit

Page 83: Cranking Floating Point Performance Up To 11

VFP unitA0

Page 84: Cranking Floating Point Performance Up To 11

VFP unitA0

+

Page 85: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

Page 86: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

=

Page 87: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

Page 88: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

A1

B1+

C1=

Page 89: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

A1

B1+

C1=

A2

B2+

C2=

Page 90: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

A1

B1+

C1=

A2

B2+

C2=

A3

B3+

C3=

Page 91: Cranking Floating Point Performance Up To 11

VFP unit

Page 92: Cranking Floating Point Performance Up To 11

VFP unitA0 A1 A2 A3

Page 93: Cranking Floating Point Performance Up To 11

VFP unit

+A0 A1 A2 A3

Page 94: Cranking Floating Point Performance Up To 11

VFP unit

+A0 A1 A2 A3

B0 B1 B2 B3

Page 95: Cranking Floating Point Performance Up To 11

VFP unit

+

=

A0 A1 A2 A3

B0 B1 B2 B3

Page 96: Cranking Floating Point Performance Up To 11

VFP unit

+

=

A0 A1 A2 A3

B0 B1 B2 B3

C0 C1 C2 C3

Page 97: Cranking Floating Point Performance Up To 11

VFP unit

+

=

A0 A1 A2 A3

B0 B1 B2 B3

C0 C1 C2 C3

Sweet! How do we use the vfp?

Page 98: Cranking Floating Point Performance Up To 11

"fldmias %2, {s8-s23} \n\t" "fldmias %1!, {s0-s3} \n\t" "fmuls s24, s8, s0 \n\t" "fmacs s24, s12, s1 \n\t"

"fldmias %1!, {s4-s7} \n\t"

"fmacs s24, s16, s2 \n\t" "fmacs s24, s20, s3 \n\t" "fstmias %0!, {s24-s27} \n\t"

Like this!

Page 99: Cranking Floating Point Performance Up To 11

Writing vfp assembly

Page 100: Cranking Floating Point Performance Up To 11

Writing vfp assembly

• There are two parts to it

Page 101: Cranking Floating Point Performance Up To 11

Writing vfp assembly

• There are two parts to it

• How to write any assembly in gcc

Page 102: Cranking Floating Point Performance Up To 11

Writing vfp assembly

• There are two parts to it

• How to write any assembly in gcc

• Learning ARM and VPM assembly

Page 103: Cranking Floating Point Performance Up To 11

vfpmath library

Page 104: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

Page 105: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

• http://code.google.com/p/vfpmathlibrary

Page 106: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

• http://code.google.com/p/vfpmathlibrary

• Vector/matrix math

Page 107: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

• http://code.google.com/p/vfpmathlibrary

• Vector/matrix math

• Might not be exactly what you need, but it’s a great starting point

Page 108: Cranking Floating Point Performance Up To 11

Assembly in gcc

• Only use it when targeting the device

Page 109: Cranking Floating Point Performance Up To 11

Assembly in gcc

• Only use it when targeting the device

#include <TargetConditionals.h>#if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1) #define USE_VFP#endif

Page 110: Cranking Floating Point Performance Up To 11

Assembly in gcc

• The basics

asm (“cmp r2, r1”);

Page 112: Cranking Floating Point Performance Up To 11

Assembly in gcc

• Multiple lines

asm ( “mov r0, #1000\n\t” “cmp r2, r1\n\t”);

Page 113: Cranking Floating Point Performance Up To 11

Assembly in gcc• Accessing C variables

asm (//assembly code : // output operands : // input operands : // clobbered registers);

Page 114: Cranking Floating Point Performance Up To 11

Assembly in gcc• Accessing C variables

asm (//assembly code : // output operands : // input operands : // clobbered registers);

int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );

Page 115: Cranking Floating Point Performance Up To 11

Assembly in gcc• Accessing C variables

asm (//assembly code : // output operands : // input operands : // clobbered registers);

int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );

%0, %1, etc are the variables in order

Page 116: Cranking Floating Point Performance Up To 11

Assembly in gcc

Page 117: Cranking Floating Point Performance Up To 11

Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );

Page 118: Cranking Floating Point Performance Up To 11

Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );

Clobber register list are registers used by

the asm block

Page 119: Cranking Floating Point Performance Up To 11

Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );

Clobber register list are registers used by

the asm block

volatile prevents “optimizations”

Page 120: Cranking Floating Point Performance Up To 11

VFP asmFour banks of 8 32-bit registers each

Page 121: Cranking Floating Point Performance Up To 11

VFP asmFour banks of 8 32-bit registers each

#define VFP_VECTOR_LENGTH(VEC_LENGTH) "fmrx r0, fpscr \n\t" \ "bic r0, r0, #0x00370000 \n\t" \ "orr r0, r0, #0x000" #VEC_LENGTH "0000 \n\t" \ "fmxr fpscr, r0 \n\t"

Page 122: Cranking Floating Point Performance Up To 11

VFP asm

Page 123: Cranking Floating Point Performance Up To 11

VFP asm

Page 124: Cranking Floating Point Performance Up To 11

VFP asmfor (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Page 125: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Page 126: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Was: 2.6 seconds

Page 127: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Was: 2.6 secondsNow: 1.4 seconds!!

Page 128: Cranking Floating Point Performance Up To 11

VFP asmLet’s do 6 operations at once!

struct Particle2 { float x0, y0, z0; float x1, y1, z1; float vx0, vy0, vz0; float vx1, vy1, vz1; };

Page 129: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

Page 130: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds

Page 131: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds

Now: 1.2 seconds

Page 132: Cranking Floating Point Performance Up To 11

VFP asmWhat’s the loop/cache overhead?

for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }

Page 133: Cranking Floating Point Performance Up To 11

VFP asmWhat’s the loop/cache overhead?

for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }

Was: 1.2 seconds

Page 134: Cranking Floating Point Performance Up To 11

VFP asmWhat’s the loop/cache overhead?

for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }

Was: 1.2 secondsNow: 1.2 seconds!!!!

Page 135: Cranking Floating Point Performance Up To 11
Page 136: Cranking Floating Point Performance Up To 11

Matrix multiply

Page 137: Cranking Floating Point Performance Up To 11

Matrix multiplyStraight from vfpmathlib

Page 138: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 s

Straight from vfpmathlib

Page 139: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 sNormal: 0.096855 s

Straight from vfpmathlib

Page 140: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s

Straight from vfpmathlib

Page 141: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s

About 2x faster!

Straight from vfpmathlib

Page 142: Cranking Floating Point Performance Up To 11

Good use of vfp

Page 143: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

Page 144: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

Page 145: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

Page 146: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

• Physics

Page 147: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

• Physics

• Procedural content generation

Page 148: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

• Physics

• Procedural content generation

• ....

Page 149: Cranking Floating Point Performance Up To 11

What about the 3GS?

Page 150: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 151: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 152: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 153: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 154: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 155: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 156: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 157: Cranking Floating Point Performance Up To 11

More 3GS: NEON

Page 158: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

Page 159: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

• Floating point and integer

Page 160: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

• Floating point and integer

• Huge potential

Page 161: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

• Floating point and integer

• Huge potential

• Very little documentation right now :-(

Page 162: Cranking Floating Point Performance Up To 11

Thank you!

Noel LlopisSnappy Touch

http://twitter.com/[email protected]

http://gamesfromwithin.com


Top Related