Distributed Hashing

As far as distributed computing goes, distributed hashing has always been one of the original uses of a distributed network. A novelty to some, a necessity to others, hashing a lot of keys takes a very, very long time depending on the algorithm. Hashing keys does not require as much memory as rendering or compiling, which will probably work well for the XBOX cluster. Below, you can see a quick print-out of how fast OpenSSL works on a single node of our XBOX cluster, in comparison to the Sempron 2200+ machine mentioned earlier.

XBOX

OpenSSL 0.9.7c 30 Sep 2003
built on: Mon Feb 23 18:18:12 GMT 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                557.47k     1546.09k     2091.52k     2292.74k     2362.03k
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4               9312.01k    50638.08k   104728.32k   143168.85k   160380.25k
md5               3405.55k    15094.19k    25995.73k    31601.87k    33746.11k
hmac(md5)         1456.18k     8698.11k    19730.66k    28729.00k    33202.18k
sha1              3989.46k    15676.20k    27201.45k    33287.51k    35621.55k
rmd160            2942.53k    12834.88k    21769.13k    26220.89k    27904.68k
rc4              42422.11k    56691.63k    59720.45k    60751.63k    60867.38k
des cbc           7463.69k     8229.14k     8364.52k     8364.03k     8366.76k
des ede3          2861.27k     2966.78k     2985.30k     2990.08k     2990.08k
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc           5796.07k     6218.69k     6275.93k     6291.11k     6313.30k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     16639.31k    20924.97k    21380.52k    21482.50k    21570.44k
cast cbc         12566.61k    14372.20k    14635.41k    14653.44k    14666.41k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0035s   0.0003s    286.7   3299.3
rsa 1024 bits   0.0195s   0.0010s     51.2   1049.0
rsa 2048 bits   0.1167s   0.0032s      8.6    314.0
rsa 4096 bits   0.7650s   0.0111s      1.3     90.3
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0032s   0.0039s    313.5    259.5
dsa 1024 bits   0.0097s   0.0120s    103.3     83.4
dsa 2048 bits   0.0317s   0.0389s     31.6     25.7
OpenSSL>

Sempron 2200+

OpenSSL 0.9.7c 30 Sep 2003
built on: Mon Feb 23 18:18:12 GMT 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2               1307.26k     2797.72k     3917.72k     4364.70k     4511.21k
mdc2              3019.04k     3408.81k     3522.52k     3549.56k     3565.32k
md4              11304.80k    39934.94k   117963.41k   229924.63k   318395.57k
md5               8789.64k    29747.05k    81595.00k   144216.44k   185556.36k
hmac(md5)         5081.28k    18230.65k    57049.28k   121655.26k   181687.44k
sha1              8715.38k    26775.46k    63727.77k    97333.12k   115140.36k
rmd160            7386.51k    21200.32k    45464.16k    64026.54k    72683.38k
rc4              88866.45k    94630.03k    96065.66k    96649.60k    97376.13k
des cbc          18098.33k    18586.49k    18717.35k    18809.36k    18772.63k
des ede3          6551.90k     6620.66k     6660.51k     6666.11k     6667.71k
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc          15875.50k    16446.24k    16598.92k    16634.42k    16655.25k
rc5-32/12 cbc    73740.31k    88512.81k    92535.12k    94040.06k    94410.97k
blowfish cbc     44137.47k    48594.05k    49798.41k    50179.96k    50288.17k
cast cbc         30171.40k    32328.70k    32877.09k    33049.36k    33011.88k
aes-128 cbc      37067.76k    37903.95k    38377.07k    38499.61k    38531.59k
aes-192 cbc      32505.08k    33104.00k    33376.89k    33556.62k    33580.99k
aes-256 cbc      28702.51k    29251.51k    29532.40k    29604.97k    29624.04k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0014s   0.0001s    702.7   8190.2
rsa 1024 bits   0.0074s   0.0004s    135.4   2648.0
rsa 2048 bits   0.0452s   0.0013s     22.1    798.2
rsa 4096 bits   0.2956s   0.0042s      3.4    236.2
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0012s   0.0015s    820.1    653.8
dsa 1024 bits   0.0038s   0.0048s    266.1    209.8
dsa 2048 bits   0.0122s   0.0152s     81.8     65.6
OpenSSL>

Our hopes of making this XBOX distributed cluster thing worthwhile experiment are slowly diminishing. Again, we aren't out to break any speed records here, but we would love to see some XBOXes scale well. As another point of reference, our Opteron 150 machine can sign about 1063 RSA 1024 keys per second (in a 64-bit environment); that's approximately 20 times faster than what the XBOX is capable of.

The XBOX is very slow, but having a whole lot of them might scale well. We look to one of our favorite programs, John the Ripper, for more advice. Distrubuted John, or djohn, behaves similarly to make, assigning a different portion of the total keyspace to each machine. Djohn scales extremely linearly, since code cracking times are very long and there are very few places for network latency to interfere with our hashing. Since it takes basically the same amount of computing power to hash the same length key with a different character set, our total password cracking power is equal to N times the password cracking power of one machine, where N is the number of machines in the cluster (assuming the machines are all the same speed). You can see below how various machines performed JTR benchmark tests. JTR was compiled with GCC 3.3.3 with only MMX optimizations.

Distributed John the Ripper 1.6.37 - DES [64/64 BS]

Distributed John the Ripper 1.6.37 - MD5 [32/32]

Distributed John the Ripper 1.6.37 - Blowfish (x32) [32/64]

A star denotes estimated performance. Our eight-way cluster fares pretty well against its workstation competition, but properly manipulating the compile options could tilt the results in any configuration's favor. Also keep in mind that 64-bit compilations of JTR yield up to 20% performance boosts as well on Blowfish, and we get those performance benefits from the x86 platform on which the XBOXes run. Cracking keys shows some immediate promise on our XBOX cluster, but looks to be the only real application.

Not to give the XBOX too much credit here, the puny Sempron performs much better at Blowfish and MD5 hashing. Building an equivalent cluster of Sempron machines would yield more than double the crunching power on Blowfish, and even more on MD5.

Distributed Rendering Final Thoughts
Comments Locked

30 Comments

View All Comments

  • TimPope - Thursday, May 12, 2005 - link

    not bad information but i would have liked to see some kind of real world performance using openmosix.. a single x box on its own as a pc is slow but stick 2-4 together using open mosix could make a reasonably good machine and still be pretty cheap
  • Halz - Wednesday, November 17, 2004 - link

    The rule followed in the article for the -j option, "number of proccessors + 1", overlooked the logical proccessors of the Xeon's Hyperthreading.. -j should have then been something around 5 instead of 3
  • Halz - Wednesday, November 17, 2004 - link

    Simply compiling on the Opteron and Xeon with the same number of threads as the full cluster would have illustraighted a difference.

    More testing should have gone into finding how many threads was the ideal number for the given platforms.
  • artifex - Saturday, November 13, 2004 - link

    Aikouka, can't you just use one of those "HD Loader" type programs WITHOUT a modchip?
    I'd be all for modding my PS/2 if I thought I could actually do something useful with it, like stream audio/video from a PC or a ReplayTV or something.
  • KristopherKubicki - Saturday, November 13, 2004 - link

    Halz: what should it have been?

    Kristopher
  • Aikouka - Thursday, November 11, 2004 - link

    23, yes, you can still do just about anything. I know with the software mod that I use, I've been having problems getting the original MS Dash to load up, but I've gotten around that using other programs for the original dashboard's functionality (dvd etc).

    You know, you can also replace the HDD with just a software mod, and it's not that hard. So, if you don't want to hardware mod and want more space, you can still put in a bigger HDD. As much as some people don't like the XBOX, in my opinion, it's probably the best console to mod.

    24, 2) Modchips also allow hdd loading if you have the PS2 HDD (using HDDLoader.) Also, it lets the warez'ers download and play games on the PS2 that they don't really own.
  • artifex - Thursday, November 11, 2004 - link

    1) what we really need is a usb-based tv tuner that actually works. That would be excellent for adding functionality both to XBoxen as cheap PVRs (though I'd still just use XBMC to stream from my ReplayTV, most of the time), but also would be great for iMacs. I'm sure if someone came up with a decent open architecture design, the community would come up with drivers for both types of systems.

    2) what are modchips for PS2s useful for, other than playing import games? Especially with the new PS2s having no drive (is there still a header on the new board style to add one back?)

    3) did I miss the obligatory dnetc test? You gotta do that, you know.
  • Booty - Thursday, November 11, 2004 - link

    I don't even own an Xbox, but reading this article has me reaching for my wallet...

    But first, I want to get this straight - I can mod the Xbox and still use XLink, right? I doubt I'd get a Live subscription anyway, but it'd be nice to have that option possible.

    Ideally I'd like to throw a bigger hard drive in there and then run XBMC, without losing the normal XBox capabilities.

    So if I can do that, I'm goin' to the store this weekend... :)
  • Halz - Thursday, November 11, 2004 - link

    The compile options for the Opteron and Xeon were starving the CPUs; the number of jobs (-j) was no where near optimal.
  • Halz - Thursday, November 11, 2004 - link

Log in

Don't have an account? Sign up now