From dcarboni at gmail.com Tue Feb 1 17:22:58 2005 From: dcarboni at gmail.com (Davide Carboni) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] simulator for p2p Message-ID: <71b79fa9050201092273a5f7ba@mail.gmail.com> Hi, is there any way to simulate a p2p network using a single PC? I know ns2 but it seems very "low-level" simulation. I'd like something to simulate a network of peers abstracting from the serialization of messages. For instance, I'd like to model peers like objects in memory which exchange messages invoking methods each other but taking into account variables like the bandwidth, the latency and so forth. Bye, Davide From srhea at cs.berkeley.edu Tue Feb 1 19:19:59 2005 From: srhea at cs.berkeley.edu (Sean C. Rhea) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] simulator for p2p In-Reply-To: <71b79fa9050201092273a5f7ba@mail.gmail.com> References: <71b79fa9050201092273a5f7ba@mail.gmail.com> Message-ID: On Feb 1, 2005, at 9:22 AM, Davide Carboni wrote: > is there any way to simulate a p2p network using a single PC? I know > ns2 but it seems very "low-level" simulation. I'd like something to > simulate a network of peers abstracting from the serialization of > messages. For instance, I'd like to model peers like objects in memory > which exchange messages invoking methods each other but taking into > account variables like the bandwidth, the latency and so forth. Bamboo (bamboo-dht.org) comes with a simple simulator that models latency based on real measurements (the data is from here: http://www.pdos.lcs.mit.edu/p2psim/kingdata/). It's a pretty simple event-driven simulator written in Java; the nice thing about it is that you can use the same code under simulation that you use on the real net. To use it, download the latest Bamboo CVS snapshot and try this: cd bamboo/src/bamboo/sim ./make-startup-test.pl ../../../bin/run-java bamboo.sim.Simulator /tmp/startup-test.exp It will start up 29 Bamboo nodes that will then form a Bamboo network. It's a pretty simple example, but it should give you the idea. The PDOS group at MIT also has a simulator. It's at http://www.pdos.lcs.mit.edu/p2psim/. It uses threads instead of events, and C++ instead of Java. It also models only latency. Both of these simulators should be able to simulate 200-1000 nodes, depending on how much core memory your machine has. Modeling bandwidth is hard to do at scale. (It's one of the reasons NS2 doesn't scale too well.) Sean -- We are all in the gutter, but some of us are looking at the stars. -- Oscar Wilde -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20050201/781b2a03/PGP.pgp From davidopp at cs.berkeley.edu Tue Feb 1 19:27:47 2005 From: davidopp at cs.berkeley.edu (David L. Oppenheimer) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] simulator for p2p In-Reply-To: Message-ID: <200502011927.LAA10387@mindbender.davido.com> > Bamboo (bamboo-dht.org) comes with a simple simulator that models > latency based on real measurements (the data is from here: > http://www.pdos.lcs.mit.edu/p2psim/kingdata/). It's a pretty simple > event-driven simulator written in Java; the nice thing about > it is that > you can use the same code under simulation that you use on the real > net. And because you can run the same code on the "real net," you can run the same code under emulation on a cluster to study bandwidth effects. David From srhea at cs.berkeley.edu Tue Feb 1 19:51:51 2005 From: srhea at cs.berkeley.edu (Sean C. Rhea) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] simulator for p2p In-Reply-To: <200502011927.LAA10387@mindbender.davido.com> References: <200502011927.LAA10387@mindbender.davido.com> Message-ID: <1f6262437486549bbe1834eb3149f490@cs.berkeley.edu> On Feb 1, 2005, at 11:27 AM, David L. Oppenheimer wrote: >> Bamboo (bamboo-dht.org) comes with a simple simulator that models >> latency based on real measurements (the data is from here: >> http://www.pdos.lcs.mit.edu/p2psim/kingdata/). It's a pretty simple >> event-driven simulator written in Java; the nice thing about >> it is that you can use the same code under simulation that you use on >> the real net. > > And because you can run the same code on the "real net," you can run > the > same code under emulation on a cluster to study bandwidth effects. That's a good point. We run the same code under the Bamboo simulator, on a local cluster using ModelNet (http://issg.cs.duke.edu/modelnet.html) to provide wide-area-like latency and bandwidth restrictions, and on PlanetLab (http://planet-lab.org/). Sean -- An atheist doesn't have to be someone who thinks he has a proof that there can't be a god. He only has to be someone who believes that the evidence on the God question is at a similar level to the evidence on the were-wolf question. -- John McCarthy -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20050201/5ed10b82/PGP.pgp From hopper at omnifarious.org Wed Feb 2 03:17:28 2005 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] simulator for p2p In-Reply-To: <71b79fa9050201092273a5f7ba@mail.gmail.com> References: <71b79fa9050201092273a5f7ba@mail.gmail.com> Message-ID: <1107314248.25868.59.camel@bats.omnifarious.org> On Tue, 2005-02-01 at 18:22 +0100, Davide Carboni wrote: > Hi, > is there any way to simulate a p2p network using a single PC? I know > ns2 but it seems very "low-level" simulation. I'd like something to > simulate a network of peers abstracting from the serialization of > messages. For instance, I'd like to model peers like objects in memory > which exchange messages invoking methods each other but taking into > account variables like the bandwidth, the latency and so forth. You could probably write a replacement for SocketModule in my StreamModule framework (http://www.omnifarious.org/StrMod/) that could simulate some of the latency characteristics of a network connection. If you wrote the code to use StreamModule, you could then put in real SocketModules instead and it would work over a real network with no other changes. Have fun (if at all possible), -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20050201/aa1d07bf/attachment.pgp From sdaswani at gmail.com Wed Feb 2 06:35:12 2005 From: sdaswani at gmail.com (Susheel Daswani) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Altnet Patent Message-ID: <1cd056b905020122353cd6ad68@mail.gmail.com> Hey Folks, I'm not sure how everyone is handling the Altnet patent threat, but in my studies I've come across some salient points regarding patent infringement: "For an accused product to literally infringe a patent, EVERY element contained in the patent claim must also be present in the accused product or device. If a claimed apparatus has five parts, or 'elements', and the allegedly infringing apparatus has only four of those five, it does not literally infringe. This is true even though the defendant may have copied the four elements exactly, and regardless of how significant or insignificant the missing element is." 'Intellectual Property in the New Technological Age', 3rd Edition, page 230 This may already be known, but I thought I'd put it out there. So everyone should analyse their hashing systems to see how they compare to Altnet's patent elements. If you don't do everything they do, you can ignore their dinky letter :). I'm going to analyse their claims soon and compare to the systems I know. Some more interesting information, which is probably obvious: "[I]t does not matter [if] a defendant has ADDED several new elements -- adding new features cannot help a defendant escape infringement." Susheel From samnospam at bcgreen.com Wed Feb 2 09:06:42 2005 From: samnospam at bcgreen.com (Stephen Samuel (leave the email alone)) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Altnet Patent (Prior art) In-Reply-To: <1cd056b905020122353cd6ad68@mail.gmail.com> References: <1cd056b905020122353cd6ad68@mail.gmail.com> Message-ID: <42009822.1030904@bcgreen.com> I'm thinking that one well-documented example of prior art for the altnet patent might be the pgp neteork which identifies and distributes PGP keys by their hash IDs. In the case of pgp.net, there are actually a couple of lengths of hash keys: Short, lont and fingerprint. Susheel Daswani wrote: > Hey Folks, > I'm not sure how everyone is handling the Altnet patent threat, but in > my studies I've come across some salient points regarding patent > infringement: -- Stephen Samuel +1(604)876-0426 samnospam@bcgreen.com http://www.bcgreen.com/ Powerful committed communication. Transformation touching the jewel within each person and bringing it to light. From aloeser at cs.tu-berlin.de Thu Feb 3 10:32:18 2005 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? Message-ID: <4201FDB1.6F607C0C@cs.tu-berlin.de> Hey all, structured overlay networks based on DHT's, such as Pastry and Chord among others, have been investigated in the past to construct scalable and performance orientated peer-to-peer networks. However, unstructured networks, such as Gnutella or Kazaa, are still widely used among the file sharing community. Recently researchers proposed extensions to unstructured networks networks based on the small world idea: peers dynamically create shortcuts to other peers based on their interests. Over a while peers with the same interests became direct neighbors through its shortcuts and build interest based clusters. Hence peers no longer flood messages but partly route it's queries via a interested based/semantic overlay. Examples are described in [1] [2] among others. Comparing small world and DHT approaches is a difficult task, since simulations usually differ in scenarios, data sets or simulation methodology. I'm interested in scenarios and arguments PRO small world overlays for unstructured networks. Does anybody now actual theoretic or practical work that compares both approaches in different scenarios (high churn, no super peers, key word based search, meta data based search)? Which scenarios or arguments support small world approaches for unstructured networks? Alex [1] Gia - Making Gnutella like P2P Systems Scalable http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf http://seattle.intel-research.net/people/yatin/publications/talks/sigcomm2003-gia.ppt [2] Efficient Content Location Using Interest Based Locality in Peer-to-Peer Systems http://www.ieee-infocom.org/2003/papers/53_01.PDF -- ___________________________________________________________ Alexander L?ser Technische Universitaet Berlin hp: http://cis.cs.tu-berlin.de/~aloeser/ office: +49- 30-314-25551 fax : +49- 30-314-21601 ___________________________________________________________ From zooko at zooko.com Thu Feb 3 12:43:26 2005 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Re: TCP thru' double NAT? In-Reply-To: <4201DC25.6070508@ucla.edu> References: <409EC974.9000007@vaste.mine.nu> <409ECB89.8010408@locut.us> <20040512113330.GA2606@bitchcake.off.net> <4201DC25.6070508@ucla.edu> Message-ID: <90cc48fc65f4f090eb9e558264a311db@zooko.com> [responding on-list to off-list query] > I know this p2p-hackers message is from loooong ago, but I had a quick > question -- does the TCP relay currently implemented in Mnet use the > technique described in Section 3.5 of that document? At the end it > says that "Unfortunately, this trick may be even more fragile and > timing-sensitive than the UDP port number prediction trick described > above... Applications that require efficient, direct peer-to-peer > communication over existing NATs should use UDP." It doesn't sound > like a technique to get good results with, although you report success > -- so I was just curious. Hi Michael: The Mnet hack is low-tech. A node which is not behind NAT or firewall volunteers to be a relay server. It receives msgs from node A via TCP and sends them to node B via TCP, all in user-land. There are plenty of obvious drawbacks, but it works for Mnet's purposes. I believe Skype does something similar, when Skype's more efficient alternatives fail. Regards, Zooko --- Please excuse terse writing -- there is a baby in my arms. From Bernard.Traversat at Sun.COM Thu Feb 3 14:04:31 2005 From: Bernard.Traversat at Sun.COM (Bernard Traversat) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? In-Reply-To: <4201FDB1.6F607C0C@cs.tu-berlin.de> References: <4201FDB1.6F607C0C@cs.tu-berlin.de> Message-ID: <42022F6F.1040707@Sun.COM> You may want to look at JXTA (www.jxta.org) which provides an hybrid architecture allowing you to deploy both structured and ad hoc unstructured P2P network overlays. Cheers, B. Alexander L?ser wrote: > Hey all, > structured overlay networks based on DHT's, such as Pastry and Chord > among others, have been investigated in the past to construct scalable > and performance orientated peer-to-peer networks. However, unstructured > networks, such as Gnutella or Kazaa, are still widely used among the > file sharing community. Recently researchers proposed extensions to > unstructured networks networks based on the small world idea: peers > dynamically create shortcuts to other peers based on their interests. > Over a while peers with the same interests became direct neighbors > through its shortcuts and build interest based clusters. Hence peers > no longer flood messages but partly route it's queries via a interested > based/semantic overlay. Examples are described in [1] [2] among > others. > > Comparing small world and DHT approaches is a difficult task, since > simulations usually differ in scenarios, data sets or simulation > methodology. I'm interested in scenarios and arguments PRO small > world overlays for unstructured networks. Does anybody now actual > theoretic or practical work that compares both approaches in different > scenarios (high churn, no super peers, key word based search, meta data > based search)? Which scenarios or arguments support small world > approaches for unstructured networks? > > Alex > > > > > [1] Gia - Making Gnutella like P2P Systems Scalable > http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf > http://seattle.intel-research.net/people/yatin/publications/talks/sigcomm2003-gia.ppt > > [2] Efficient Content Location Using Interest Based Locality in > Peer-to-Peer Systems > http://www.ieee-infocom.org/2003/papers/53_01.PDF > -- > ___________________________________________________________ > > Alexander L?ser > Technische Universitaet Berlin > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 > ___________________________________________________________ > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From gbildson at limepeer.com Thu Feb 3 15:26:25 2005 From: gbildson at limepeer.com (Greg Bildson) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? In-Reply-To: <4201FDB1.6F607C0C@cs.tu-berlin.de> Message-ID: I'd just like to point out that Gnutella does not use pure flooding anymore and you are unlikely to find P2P networks that don't have something akin to supernodes. Gnutella uses bloom filter based keyword index replication and dynamic querying (selectively sending out queries until a result limit is reached) to reduce the overhead of flooding for popular queries and to route all queries on the last hop. Thanks -greg > -----Original Message----- > From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On > Behalf Of Alexander L?ser > Sent: Thursday, February 03, 2005 5:32 AM > To: p2p-hackers@zgp.org > Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? > > > Hey all, > structured overlay networks based on DHT's, such as Pastry and Chord > among others, have been investigated in the past to construct scalable > and performance orientated peer-to-peer networks. However, unstructured > networks, such as Gnutella or Kazaa, are still widely used among the > file sharing community. Recently researchers proposed extensions to > unstructured networks networks based on the small world idea: peers > dynamically create shortcuts to other peers based on their interests. > Over a while peers with the same interests became direct neighbors > through its shortcuts and build interest based clusters. Hence peers > no longer flood messages but partly route it's queries via a interested > based/semantic overlay. Examples are described in [1] [2] among > others. > > Comparing small world and DHT approaches is a difficult task, since > simulations usually differ in scenarios, data sets or simulation > methodology. I'm interested in scenarios and arguments PRO small > world overlays for unstructured networks. Does anybody now actual > theoretic or practical work that compares both approaches in different > scenarios (high churn, no super peers, key word based search, meta data > based search)? Which scenarios or arguments support small world > approaches for unstructured networks? > > Alex > > > > > [1] Gia - Making Gnutella like P2P Systems Scalable > http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf > http://seattle.intel-research.net/people/yatin/publications/talks/ > sigcomm2003-gia.ppt > > [2] Efficient Content Location Using Interest Based Locality in > Peer-to-Peer Systems > http://www.ieee-infocom.org/2003/papers/53_01.PDF > -- > ___________________________________________________________ > > Alexander L?ser > Technische Universitaet Berlin > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 > ___________________________________________________________ > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From gwendal.simon at francetelecom.com Thu Feb 3 15:49:10 2005 From: gwendal.simon at francetelecom.com (SIMON Gwendal RD-MAPS-ISS) Date: Sat Dec 9 22:12:50 2006 Subject: TR: [p2p-hackers] Paradigma Question: DHT's or Small World? Message-ID: Hi, Here are two assumptions that advocate for small-world. The first one, related to the human language, has been partially established by several studies [1,2] since the pioneering work of [3]. The graph of word interactions is constructed by linking two words when they co-occur in a sentence (a fortiori in a file). The study of the properties of these graphs shows they exhibit the small world effect and a scale-free distribution of degrees. The second assumption follows the observations you cite and some others [4,5,6]. The data-sharing graph is constructed by linking two users when they share a same file. Observations on several real traces show that this graph exhibits also the small-world effect and the scale-free distribution of degrees. Besides, it is known that the lexicon of an human contains few thousands of words. This lexicon and the words contained in the documents which have been produced and dowloaded by an user define her "semantic profile". Through the preceeding assumptions, we naturally infer that the graph generated by linking users when their semantic profile overlap is also small-world and scale-free. That is, if we consider that users emit requests on keywords chosen within their profile, we can expect that almost *all* files of interest for an user are stored by a small set of "friends". Moreover, these "friends" are already known by the user thanks to previous successfull queries. Therefore, it is possible to limit the search to a subspace of the information space without preventing the quality of responses. On the contrary, it is probable that these responses are more relevant for the requester point of view. For instance, a fan of "Fiona Apple" will discover mp3 of Fiona Apple and not informations on Apple Inc. or webpages for "apple pie" cooking. Or, an European querying informations on "football" will not receive pages on NFL. By the way, another related concern is the publication of a file. In a gnutella-like systems, peers just have to put their files in their "shared directory" in order to make them available by any node in the system. On the contrary, the task of publication in a DHT-based overlay requires to reach as many peers as the number of words describing the published document. Indeed, the published file has to be known by the peers that are responsible of all the *relevant* words of the document. This is clearly an issue for keyword-based search in DHTs. If you want to design a search engine indexing *all* words in the document, this task becomes unrealistic. -------------------- Gwendal Simon France Telecom R&D http://solipsis.netofpeers.net [1] D. Watts. Six Degrees. [2] A. Barabasi. Linked: the New Science of Networks. [3] R. Ferrer i Canco and R. Sole. The Small World of Human Language. [4] J. Keller, D. Stern and F. Dang Ngoc. MAAY: A Self-Adaptive Peer Network for Efficient Document Search. [5] V. Cholvi, P. Felber, and E.W. Biersack. Efficient Search in Unstructured Peer-to-Peer Networks. [6] Adriana Iamnitchi, Matei Ripeanu and Ian Foster, Small-World File-Sharing Communities. > -----Message d'origine----- > De : p2p-hackers-bounces@zgp.org > [mailto:p2p-hackers-bounces@zgp.org] De la part de Alexander L?ser > Envoy? : jeudi 3 f?vrier 2005 11:32 ? : p2p-hackers@zgp.org Objet : > [p2p-hackers] Paradigma Question: DHT's or Small World? > > Hey all, > structured overlay networks based on DHT's, such as Pastry and Chord > among others, have been investigated in the past to construct scalable > and performance orientated peer-to-peer networks. However, > unstructured networks, such as Gnutella or Kazaa, are still widely > used among the file sharing community. Recently researchers proposed > extensions to unstructured networks networks based on the small world > idea: peers dynamically create shortcuts to other peers based on their > interests. > Over a while peers with the same interests became direct neighbors > through its shortcuts and build interest based clusters. Hence peers > no longer flood messages but partly route it's queries via a > interested based/semantic overlay. Examples are described in [1] [2] > among others. > > Comparing small world and DHT approaches is a difficult task, since > simulations usually differ in scenarios, data sets or simulation > methodology. I'm interested in scenarios and arguments PRO small > world overlays for unstructured networks. Does anybody now actual > theoretic or practical work that compares both approaches in different > scenarios (high churn, no super peers, key word based search, meta > data based search)? Which scenarios or arguments support small world > approaches for unstructured networks? > > Alex > > > > > [1] Gia - Making Gnutella like P2P Systems Scalable > http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf > http://seattle.intel-research.net/people/yatin/publications/ta > lks/sigcomm2003-gia.ppt > > [2] Efficient Content Location Using Interest Based Locality in > Peer-to-Peer Systems http://www.ieee-infocom.org/2003/papers/53_01.PDF > -- > ___________________________________________________________ > > Alexander L?ser > Technische Universitaet Berlin > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 > ___________________________________________________________ > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > From bryan.turner at pobox.com Thu Feb 3 16:35:02 2005 From: bryan.turner at pobox.com (Bryan Turner) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? In-Reply-To: Message-ID: <200502031635.j13GZ3jZ020887@rtp-core-1.cisco.com> Regarding Small World vs DHT; Pedantically, there is no difference.. you can map a DHT to Small World by viewing the domain of the DHT (it's keyspace) to be the semantic information sought by the peers. Thus, peers which seek nearby points in the keyspace are linked by Small World links, while those which seek distant points are only occasionally referenced. The difference that is being argued is what the USER is interested in versus what the PEER is interested in. In a DHT, the peer is required to be interested in keys which conform to some dynamic metric based on the specific model of DHT being used, while there is no model for the user's interests. I'm not arguing for or against Small World - simply that the models are equally expressive and thus equally capable of implementing each other's features. Just something to keep in mind. And to keep things on track: Gwendal, I like your explanation of a user's semantic profile, it's very crisp and approachable. It's been difficult to explain to colleagues in the past, next time I'll use your words. ;) In the following by "Gnutella", I mean "Gnutella-like systems". Please do not be offended by my mis-representation of the specific features supported by Gnutella. I see publishing between the two mediums in a different light. While it seems simpler to publish under Gnutella, there are tradeoffs that you haven't pointed out. For instance, single-word queries and exact-file searches are significantly more difficult under the Gnutella model exactly because your query must reach all your 'friends' - and return from all of them! In effect you get worst-case performance for every query. DHTs achieve best-case performance for this type of query, but are burdened by a more complex publishing process. I would also like to argue that full-text indexing on all documents is equally difficult for *both* models. My reasoning follows from the processing requirements (in any model) to index/query a full document: 1. Process a document to produce an index. 2. Store the index for future retrieval. 3. Provide query capability to a client. 4. Discover relevant indexes to a query. 5. Search the indexes for query terms. 6. Return results It should be clear that #1, #3, and #6 are essentially the same between the two models, as some entity must perform the same amount of work for these steps regardless of how it is handled "under the covers". #2 differs only in the location where the index is stored - locally or distributed. And in the amount of work done (Gnutella;less, DHT;more). #4 differs again in the location and work, but here I argue the amount of work has reversed from #2. Gnutella requires *many* peers to perform complex queries against their complex indexes, which constitutes a great deal of work. OtoH, a DHT implicitly knows which peers to address, and which queries to perform (in fact, the very act of addressing a peer is effectively performing the query). #5 again differs, although I argue that the total amount of work performed is essentially the same. Given my arguments above, the total work performed by the "system" to achieve a query is roughly equivalent between the two models. There isn't any one area in which one of the systems is burdened by an order of magnitude over the other. --Bryan bryan.turner@pobox.com -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org] On Behalf Of SIMON Gwendal RD-MAPS-ISS Sent: Thursday, February 03, 2005 10:49 AM To: Peer-to-peer development. Subject: TR: [p2p-hackers] Paradigma Question: DHT's or Small World? Hi, Here are two assumptions that advocate for small-world. The first one, related to the human language, has been partially established by several studies [1,2] since the pioneering work of [3]. The graph of word interactions is constructed by linking two words when they co-occur in a sentence (a fortiori in a file). The study of the properties of these graphs shows they exhibit the small world effect and a scale-free distribution of degrees. The second assumption follows the observations you cite and some others [4,5,6]. The data-sharing graph is constructed by linking two users when they share a same file. Observations on several real traces show that this graph exhibits also the small-world effect and the scale-free distribution of degrees. Besides, it is known that the lexicon of an human contains few thousands of words. This lexicon and the words contained in the documents which have been produced and dowloaded by an user define her "semantic profile". Through the preceeding assumptions, we naturally infer that the graph generated by linking users when their semantic profile overlap is also small-world and scale-free. That is, if we consider that users emit requests on keywords chosen within their profile, we can expect that almost *all* files of interest for an user are stored by a small set of "friends". Moreover, these "friends" are already known by the user thanks to previous successfull queries. Therefore, it is possible to limit the search to a subspace of the information space without preventing the quality of responses. On the contrary, it is probable that these responses are more relevant for the requester point of view. For instance, a fan of "Fiona Apple" will discover mp3 of Fiona Apple and not informations on Apple Inc. or webpages for "apple pie" cooking. Or, an European querying informations on "football" will not receive pages on NFL. By the way, another related concern is the publication of a file. In a gnutella-like systems, peers just have to put their files in their "shared directory" in order to make them available by any node in the system. On the contrary, the task of publication in a DHT-based overlay requires to reach as many peers as the number of words describing the published document. Indeed, the published file has to be known by the peers that are responsible of all the *relevant* words of the document. This is clearly an issue for keyword-based search in DHTs. If you want to design a search engine indexing *all* words in the document, this task becomes unrealistic. -------------------- Gwendal Simon France Telecom R&D http://solipsis.netofpeers.net [1] D. Watts. Six Degrees. [2] A. Barabasi. Linked: the New Science of Networks. [3] R. Ferrer i Canco and R. Sole. The Small World of Human Language. [4] J. Keller, D. Stern and F. Dang Ngoc. MAAY: A Self-Adaptive Peer Network for Efficient Document Search. [5] V. Cholvi, P. Felber, and E.W. Biersack. Efficient Search in Unstructured Peer-to-Peer Networks. [6] Adriana Iamnitchi, Matei Ripeanu and Ian Foster, Small-World File-Sharing Communities. From Serguei.Osokine at efi.com Thu Feb 3 18:12:31 2005 From: Serguei.Osokine at efi.com (Serguei Osokine) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? Message-ID: <4A60C83D027E224BAA4550FB1A2B120E0DC32E@fcexmb04.efi.internal> On Thursday, February 03, 2005 Bryan Turner wrote: > Given my arguments above, the total work performed by the "system" > to achieve a query is roughly equivalent between the two models. Uh, looks to me that given your arguments above the models are logically equivalent, which says nothing about whether the work is the same or not. In fact, I can easily imagine the situations where the load would be orders of magnitude different for Gnutella and DHTs. Best wishes - S.Osokine. 3 Feb 2005. -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On Behalf Of Bryan Turner Sent: Thursday, February 03, 2005 8:35 AM To: 'Peer-to-peer development.' Subject: RE: [p2p-hackers] Paradigma Question: DHT's or Small World? Regarding Small World vs DHT; Pedantically, there is no difference.. you can map a DHT to Small World by viewing the domain of the DHT (it's keyspace) to be the semantic information sought by the peers. Thus, peers which seek nearby points in the keyspace are linked by Small World links, while those which seek distant points are only occasionally referenced. The difference that is being argued is what the USER is interested in versus what the PEER is interested in. In a DHT, the peer is required to be interested in keys which conform to some dynamic metric based on the specific model of DHT being used, while there is no model for the user's interests. I'm not arguing for or against Small World - simply that the models are equally expressive and thus equally capable of implementing each other's features. Just something to keep in mind. And to keep things on track: Gwendal, I like your explanation of a user's semantic profile, it's very crisp and approachable. It's been difficult to explain to colleagues in the past, next time I'll use your words. ;) In the following by "Gnutella", I mean "Gnutella-like systems". Please do not be offended by my mis-representation of the specific features supported by Gnutella. I see publishing between the two mediums in a different light. While it seems simpler to publish under Gnutella, there are tradeoffs that you haven't pointed out. For instance, single-word queries and exact-file searches are significantly more difficult under the Gnutella model exactly because your query must reach all your 'friends' - and return from all of them! In effect you get worst-case performance for every query. DHTs achieve best-case performance for this type of query, but are burdened by a more complex publishing process. I would also like to argue that full-text indexing on all documents is equally difficult for *both* models. My reasoning follows from the processing requirements (in any model) to index/query a full document: 1. Process a document to produce an index. 2. Store the index for future retrieval. 3. Provide query capability to a client. 4. Discover relevant indexes to a query. 5. Search the indexes for query terms. 6. Return results It should be clear that #1, #3, and #6 are essentially the same between the two models, as some entity must perform the same amount of work for these steps regardless of how it is handled "under the covers". #2 differs only in the location where the index is stored - locally or distributed. And in the amount of work done (Gnutella;less, DHT;more). #4 differs again in the location and work, but here I argue the amount of work has reversed from #2. Gnutella requires *many* peers to perform complex queries against their complex indexes, which constitutes a great deal of work. OtoH, a DHT implicitly knows which peers to address, and which queries to perform (in fact, the very act of addressing a peer is effectively performing the query). #5 again differs, although I argue that the total amount of work performed is essentially the same. Given my arguments above, the total work performed by the "system" to achieve a query is roughly equivalent between the two models. There isn't any one area in which one of the systems is burdened by an order of magnitude over the other. --Bryan bryan.turner@pobox.com -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org] On Behalf Of SIMON Gwendal RD-MAPS-ISS Sent: Thursday, February 03, 2005 10:49 AM To: Peer-to-peer development. Subject: TR: [p2p-hackers] Paradigma Question: DHT's or Small World? Hi, Here are two assumptions that advocate for small-world. The first one, related to the human language, has been partially established by several studies [1,2] since the pioneering work of [3]. The graph of word interactions is constructed by linking two words when they co-occur in a sentence (a fortiori in a file). The study of the properties of these graphs shows they exhibit the small world effect and a scale-free distribution of degrees. The second assumption follows the observations you cite and some others [4,5,6]. The data-sharing graph is constructed by linking two users when they share a same file. Observations on several real traces show that this graph exhibits also the small-world effect and the scale-free distribution of degrees. Besides, it is known that the lexicon of an human contains few thousands of words. This lexicon and the words contained in the documents which have been produced and dowloaded by an user define her "semantic profile". Through the preceeding assumptions, we naturally infer that the graph generated by linking users when their semantic profile overlap is also small-world and scale-free. That is, if we consider that users emit requests on keywords chosen within their profile, we can expect that almost *all* files of interest for an user are stored by a small set of "friends". Moreover, these "friends" are already known by the user thanks to previous successfull queries. Therefore, it is possible to limit the search to a subspace of the information space without preventing the quality of responses. On the contrary, it is probable that these responses are more relevant for the requester point of view. For instance, a fan of "Fiona Apple" will discover mp3 of Fiona Apple and not informations on Apple Inc. or webpages for "apple pie" cooking. Or, an European querying informations on "football" will not receive pages on NFL. By the way, another related concern is the publication of a file. In a gnutella-like systems, peers just have to put their files in their "shared directory" in order to make them available by any node in the system. On the contrary, the task of publication in a DHT-based overlay requires to reach as many peers as the number of words describing the published document. Indeed, the published file has to be known by the peers that are responsible of all the *relevant* words of the document. This is clearly an issue for keyword-based search in DHTs. If you want to design a search engine indexing *all* words in the document, this task becomes unrealistic. -------------------- Gwendal Simon France Telecom R&D http://solipsis.netofpeers.net [1] D. Watts. Six Degrees. [2] A. Barabasi. Linked: the New Science of Networks. [3] R. Ferrer i Canco and R. Sole. The Small World of Human Language. [4] J. Keller, D. Stern and F. Dang Ngoc. MAAY: A Self-Adaptive Peer Network for Efficient Document Search. [5] V. Cholvi, P. Felber, and E.W. Biersack. Efficient Search in Unstructured Peer-to-Peer Networks. [6] Adriana Iamnitchi, Matei Ripeanu and Ian Foster, Small-World File-Sharing Communities. _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From rita at comet.columbia.edu Thu Feb 3 18:59:39 2005 From: rita at comet.columbia.edu (Rita H. Wouhaybi) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? Message-ID: <008f01c50a22$837a66d0$9e433b80@comet.columbia.edu> Alexander L?ser wrote: > Hey all, > structured overlay networks based on DHT's, such as Pastry and Chord > among others, have been investigated in the past to construct scalable > and performance orientated peer-to-peer networks. However, unstructured > networks, such as Gnutella or Kazaa, are still widely used among the > file sharing community. Recently researchers proposed extensions to > unstructured networks networks based on the small world idea: peers > dynamically create shortcuts to other peers based on their interests. > Over a while peers with the same interests became direct neighbors > through its shortcuts and build interest based clusters. Hence peers > no longer flood messages but partly route it's queries via a interested > based/semantic overlay. Examples are described in [1] [2] among > others. > > Comparing small world and DHT approaches is a difficult task, since > simulations usually differ in scenarios, data sets or simulation > methodology. I'm interested in scenarios and arguments PRO small > world overlays for unstructured networks. Does anybody now actual > theoretic or practical work that compares both approaches in different > scenarios (high churn, no super peers, key word based search, meta data > based search)? Which scenarios or arguments support small world > approaches for unstructured networks? > > Alex > > > > > [1] Gia - Making Gnutella like P2P Systems Scalable > http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf > http://seattle.intel-research.net/people/yatin/publications/talks/sigcomm2003-gia.ppt > > [2] Efficient Content Location Using Interest Based Locality in > Peer-to-Peer Systems > http://www.ieee-infocom.org/2003/papers/53_01.PDF > -- > ___________________________________________________________ > > Alexander L?ser > Technische Universitaet Berlin > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 > ___________________________________________________________ > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers Interesting discussion Alex. >From the practical and system challenges that faced researchers working on DHTs (long time for the network to become stable, updates and maintenance for nodes join and leave, high cost of messaging when adding an object to the network, ..), it has become the norm to think about the application when trying to decide to use structured (DHTs) or unstructured (gnutella-like) p2p topologies. That is probably one of the reasons why people did not compare both structures in an analysis similar to what you are asking for. Thus, small world and power-law have emerged to bridge the gap between a total random network and a "rigid" DHT. Note that super-peers in Kazaa and Gnutella do actually help the network become more like a small-world. We also have worked in this area and created a power-law distribution P2P network that might interest you: - Rita H. Wouhaybi, and Andrew T. Campbell, "Phenix: Supporting Resilient Low-Diameter Peer-to-Peer Topologies", IEEE INFOCOM'2004, Hong Kong, China, March 7-11, 2004. Rita H. Wouhaybi rita@comet.columbia.edu http://comet.columbia.edu/~rita/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20050203/c06a78ef/attachment.html From Serguei.Osokine at efi.com Thu Feb 3 19:53:22 2005 From: Serguei.Osokine at efi.com (Serguei Osokine) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? Message-ID: <4A60C83D027E224BAA4550FB1A2B120E0DC32F@fcexmb04.efi.internal> On Thursday, February 03, 2005 Rita H. Wouhaybi wrote: > Note that super-peers in Kazaa and Gnutella do actually help > the network become more like a small-world. Not necessarily. Or at least, to a much smaller extent than the intuitive thinking would suggest. Superpeers do make the network smaller in terms of the node numbers, but at the same time they increase the traffic on the intra-ultrapeer links in exactly the same proportion, making it more difficult to route anything to the remote nodes. So the actual query reach improvement (the degree of 'small-worldness', so to speak) is improved only due to the better than average super-peer bandwidth: http://www.grouter.net/gnutella/search.htm#PlainSuperpeerNetwork http://www.grouter.net/gnutella/search.htm#Eq25 Basically, if you cannot reach all hosts in a 'flat' network (without super-peers), chances are pretty high that the introduction of super-peers won't change this situation unless the original flat network was already pretty close to being a 'small world' (fully reachable) one. The search reach in the super-peered nets like Kazaa really is better, but it comes from first, higher than average superpeer bandwidth, and second, from the proactive index replication that naturally happens when a leaf connects to several superpeers at once (three or so in Kazaa case, I believe). This one tends to be viewed as just something done to improve a connection reliability through redundancy, whereas in fact it also improves the query reach in direct proportion to the number of redundant links: http://www.grouter.net/gnutella/search.htm#RedundantSuperpeerClusters I think this effect was first noted by the Stanford P2P research group, which named it 'k-redundancy': http://www-db.stanford.edu/~byang/pubs/superpeer.pdf Best wishes - S.Osokine. 3 Feb 2005. -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On Behalf Of Rita H. Wouhaybi Sent: Thursday, February 03, 2005 11:00 AM To: p2p-hackers@zgp.org; aloeser@cs.tu-berlin.de Subject: Re:[p2p-hackers] Paradigma Question: DHT's or Small World? Alexander L?ser wrote: > Hey all, > structured overlay networks based on DHT's, such as Pastry and Chord > among others, have been investigated in the past to construct scalable > and performance orientated peer-to-peer networks. However, unstructured > networks, such as Gnutella or Kazaa, are still widely used among the > file sharing community. Recently researchers proposed extensions to > unstructured networks networks based on the small world idea: peers > dynamically create shortcuts to other peers based on their interests. > Over a while peers with the same interests became direct neighbors > through its shortcuts and build interest based clusters. Hence peers > no longer flood messages but partly route it's queries via a interested > based/semantic overlay. Examples are described in [1] [2] among > others. > > Comparing small world and DHT approaches is a difficult task, since > simulations usually differ in scenarios, data sets or simulation > methodology. I'm interested in scenarios and arguments PRO small > world overlays for unstructured networks. Does anybody now actual > theoretic or practical work that compares both approaches in different > scenarios (high churn, no super peers, key word based search, meta data > based search)? Which scenarios or arguments support small world > approaches for unstructured networks? > > Alex > > > > > [1] Gia - Making Gnutella like P2P Systems Scalable > http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf > http://seattle.intel-research.net/people/yatin/publications/talks/sigcomm2003 -gia.ppt > > [2] Efficient Content Location Using Interest Based Locality in > Peer-to-Peer Systems > http://www.ieee-infocom.org/2003/papers/53_01.PDF > -- > ___________________________________________________________ > > Alexander L?ser > Technische Universitaet Berlin > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 > ___________________________________________________________ > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers Interesting discussion Alex. >From the practical and system challenges that faced researchers working on DHTs (long time for the network to become stable, updates and maintenance for nodes join and leave, high cost of messaging when adding an object to the network, ..), it has become the norm to think about the application when trying to decide to use structured (DHTs) or unstructured (gnutella-like) p2p topologies. That is probably one of the reasons why people did not compare both structures in an analysis similar to what you are asking for. Thus, small world and power-law have emerged to bridge the gap between a total random network and a "rigid" DHT. Note that super-peers in Kazaa and Gnutella do actually help the network become more like a small-world. We also have worked in this area and created a power-law distribution P2P network that might interest you: - Rita H. Wouhaybi, and Andrew T. Campbell, "Phenix: Supporting Resilient Low-Diameter Peer-to-Peer Topologies", IEEE INFOCOM'2004, Hong Kong, China, March 7-11, 2004. Rita H. Wouhaybi rita@comet.columbia.edu http://comet.columbia.edu/~rita/ From aloeser at cs.tu-berlin.de Fri Feb 4 12:57:50 2005 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? References: <4201FDB1.6F607C0C@cs.tu-berlin.de> Message-ID: <4203714E.E1EA93CD@cs.tu-berlin.de> Thank you very much in sharing this discussion!! You gave me very valuable comments on the design question to choose either small world or DHT's. If I understood your arguments right, small world should be the preferred paradigm, if the system design requires the following (hard or soft) features: (Hard features) Churn: The system should support a high churn rate of peers/high churn rate of objects: By the way, since these hypotheses are intuitive but unproved, does anybody know a theoretical or experimental work, that proofed them? Furthermore, maybe this question is a bit naive, but what exactly is high? Complex queries: The system allows a user to pose complex queries, e.g. several keywords, or if I speak about meta data annotated documents more than one (semantic) predicate per query. (Soft features) Profile locality: One peer maps to one user. Probably a user is not interested or willing to transfer it's local profile to a global index but likes to keep it locally, e.g. for anonymity or to delete entries. Popularity: If most of the searches go for popular objects, small world may be the first choice. For example, this is the case for most music sharing networks. Community search: Depending on the shortcut creation strategies between friends on a small world network, the small world paradigm supports the data sharing graph between people with similar interests. By the way: Does it also support similar semantics? What kind of application scenario suits to this requirements? I think of a networked desktop search application. Similar to Gnutella, some people publish some of its documents, most don't. Some of them are annotated by meta data, probably with the same vocabulary or within the same ontology, some not. Users pose keyword queries, similar in a single desktop search engine. Queries either match the documents filename, folder or (if any) documents meta data. Would be the small world paradigm support such a system? Alex -- ___________________________________________________________ Alexander L?ser Technische Universit?t Berlin http://cis.cs.tu-berlin.de/~aloeser/ office : +49- 30-314-25551 fax : +49- 30-314-21601 skype : hallo.alex ___________________________________________________________ From gwendal.simon at francetelecom.com Fri Feb 4 13:26:21 2005 From: gwendal.simon at francetelecom.com (SIMON Gwendal RD-MAPS-ISS) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? Message-ID: Hi, > What kind of application scenario suits to this requirements? > I think of a networked desktop search application. Similar to > Gnutella, some people publish some of its documents, most > don't. Some of them are annotated by meta data, probably with > the same vocabulary or within the same ontology, some not. > Users pose keyword queries, similar in a single desktop > search engine. Queries either match the documents filename, > folder or (if any) documents meta data. Why do you want to restrict search to meta-data ? Google don't ! It must be possible to perform full-text search... Besides, how to define a world common ontology that could fit all future needs ? -------------------- Gwendal Simon France Telecom R&D http://solipsis.netofpeers.net From aloeser at cs.tu-berlin.de Fri Feb 4 13:43:11 2005 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? References: Message-ID: <42037BEF.8E228624@cs.tu-berlin.de> SIMON Gwendal RD-MAPS-ISS wrote: > Hi, > > > What kind of application scenario suits to this requirements? > > I think of a networked desktop search application. Similar to > > Gnutella, some people publish some of its documents, most > > don't. Some of them are annotated by meta data, probably with > > the same vocabulary or within the same ontology, some not. > > Users pose keyword queries, similar in a single desktop > > search engine. Queries either match the documents filename, > > folder or (if any) documents meta data. > > Why do you want to restrict search to meta-data ? Google don't ! It must > be possible to perform full-text search... I assume a system where its possible to search full text. Probably for a first try, within the filename and directory structure only, later in the document itself. > > Besides, how to define a world common ontology that could fit all future > needs ? However, if the document contains any valuable meta data, the system should consider this information as well. I think of documents classified by an enterprise wide topic hierarchy or research docs classified within the ACM topic hierarchy or the documents within the google/dmoz project. Or possible doctors that exchange documents classified within a medical taxonomy. Please correct me, if my assumptions are wrong. Cheers Alex -- ___________________________________________________________ Alexander L?ser Technische Universit?t Berlin http://cis.cs.tu-berlin.de/~aloeser/ office : +49- 30-314-25551 fax : +49- 30-314-21601 skype : hallo.alex ___________________________________________________________ From hopper at omnifarious.org Fri Feb 4 15:03:42 2005 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? In-Reply-To: <42037BEF.8E228624@cs.tu-berlin.de> References: <42037BEF.8E228624@cs.tu-berlin.de> Message-ID: <1107529422.6165.27.camel@bats.omnifarious.org> On Fri, 2005-02-04 at 14:43 +0100, Alexander L?ser wrote: > However, if the document contains any valuable meta data, the system should > consider this information as well. I think of documents classified by an > enterprise wide topic hierarchy or research docs classified within the ACM > topic hierarchy or the documents within the google/dmoz project. Or possible > doctors that exchange documents classified within a medical taxonomy. > > Please correct me, if my assumptions are wrong. Well, one thing any search system has to deal with is being gamed. Meta-data is too easy to game. It's data for the computer, not for people, so it can be used to trick computers into giving people information they're not actually interested in. Computers, as much as possible, have to base their searching on what people will actually look at. Now, your idea of trying to automatically get people with similar interests to group together might provide a way for computers to take advantage of knowledge of those relationships to let people sort of vet documents for one another. And that could be an interesting approach. I think one of the primary problems there is the same one google has to deal with. Party crashers. People who try to become part of a community largely in order to sow disinformation, usually for commercial gain. Have fun (if at all possible), -- The best we can hope for concerning the people at large is that they be properly armed. -- Alexander Hamilton -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20050204/c3bf8deb/attachment.pgp From bryan.turner at pobox.com Fri Feb 4 18:50:25 2005 From: bryan.turner at pobox.com (Bryan Turner) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Paradigma Question: DHT's or Small World? In-Reply-To: <4203714E.E1EA93CD@cs.tu-berlin.de> Message-ID: <200502041850.j14IoPjZ016336@rtp-core-1.cisco.com> Alex, > Churn: The system should support a high churn rate of peers/high churn rate of > objects. By the way, since these hypotheses are intuitive but unproved, does > anybody know a theoretical or experimental work, that proofed them? Furthermore, > maybe this question is a bit naive, but what exactly is high? See [1,2] for some discussion of churn and the "half-life" of a network. These models were built from Chord, but the results are useful to both systems. To answer your question more directly: "high" means close to the half-life of your network. The half-life being the time it takes half the nodes in the network to cycle off of it. If your churn rate is higher than this, you effectively cannot keep the network together, as it is outpacing your stabilization protocol. If your churn is lower, then you get a stable network. So a "high" churn rate is just under your network's half-life. > Profile locality: One peer maps to one user. Probably a user is not interested > or willing to transfer it's local profile to a global index but likes to keep > it locally, e.g. for anonymity or to delete entries. Depending on system design, anonymity may be improved if a 'peer' is actually a darknet of users. This provides k-anonymity within the group. See [3,4] for such protocols. Probably not relevant to your request, but it's fascinating research anyway.. > Popularity: If most of the searches go for popular objects, small world may > be the first choice. For example, this is the case for most music sharing networks. The greater practical concern for popularity is resolving "flash crowds" gracefully in the system. Neither DHT/Small World models define the behavior for this case. You should review some of the various solutions to this problem (too many to reference, but see [5], Section 3, and [6], Section III, for an example). > What kind of application scenario suits to this requirements? Any form of data repository where the primary user is an individual. For instance: Phone Book, Restaurant Guide, News Portal, Product Catalog, Wiki, etc.. Hope that helps! --Bryan bryan.turner@pobox.com [1] Observations on the Dynamic Evolution of Peer-to-Peer Networks David Liben-Nowell, et. al. http://citeseer.ist.psu.edu/liben-nowell02observations.html [2] Analysis of the Evolution of Peer-to-Peer Systems David Liben-Nowell, et. al. http://citeseer.ist.psu.edu/liben-nowell02analysis.html [3] k-Anonymous Message Transmission, Luis von Ahn, et. al. http://www-2.cs.cmu.edu/~abortz/work/k-anon-final.html [4] A New k-Anonymous Message Transmission Protocol Gang Yao, Dengguo Feng http://dasan.sejong.ac.kr/~wisa04/ppt/9A2.pdf [5] Novel Architectures for P2P Applications: The Continuous-Discrete Approach Moni Naor, Udi Wieder http://citeseer.ist.psu.edu/554254.html [6] Small World Overlay P2P Networks, Ken Y. K. Hui, et. al. http://www.cse.cuhk.edu.hk/~cslui/PUBLICATION/iwqos2004_small_world.pdf -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org] On Behalf Of Alexander L?ser Sent: Friday, February 04, 2005 7:58 AM To: Peer-to-peer development. Subject: Re: [p2p-hackers] Paradigma Question: DHT's or Small World? Thank you very much in sharing this discussion!! You gave me very valuable comments on the design question to choose either small world or DHT's. If I understood your arguments right, small world should be the preferred paradigm, if the system design requires the following (hard or soft) features: (Hard features) Churn: The system should support a high churn rate of peers/high churn rate of objects: By the way, since these hypotheses are intuitive but unproved, does anybody know a theoretical or experimental work, that proofed them? Furthermore, maybe this question is a bit naive, but what exactly is high? Complex queries: The system allows a user to pose complex queries, e.g. several keywords, or if I speak about meta data annotated documents more than one (semantic) predicate per query. (Soft features) Profile locality: One peer maps to one user. Probably a user is not interested or willing to transfer it's local profile to a global index but likes to keep it locally, e.g. for anonymity or to delete entries. Popularity: If most of the searches go for popular objects, small world may be the first choice. For example, this is the case for most music sharing networks. Community search: Depending on the shortcut creation strategies between friends on a small world network, the small world paradigm supports the data sharing graph between people with similar interests. By the way: Does it also support similar semantics? What kind of application scenario suits to this requirements? I think of a networked desktop search application. Similar to Gnutella, some people publish some of its documents, most don't. Some of them are annotated by meta data, probably with the same vocabulary or within the same ontology, some not. Users pose keyword queries, similar in a single desktop search engine. Queries either match the documents filename, folder or (if any) documents meta data. Would be the small world paradigm support such a system? Alex -- ___________________________________________________________ Alexander L?ser Technische Universit?t Berlin http://cis.cs.tu-berlin.de/~aloeser/ office : +49- 30-314-25551 fax : +49- 30-314-21601 skype : hallo.alex ___________________________________________________________ _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From john.casey at gmail.com Mon Feb 7 08:22:30 2005 From: john.casey at gmail.com (John Casey) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] gossiping in a DHT Message-ID: Hi All, I have been thinking about developing a gossip information dissemenation algorithm to work across a DHT. Does any one have any links to any must read papers on this topic? Conceptually, the process seems similar to that of gossip in an unstructured DHT. Just wondering if there was any prior work I should take a look at thanks. :) From davidopp at cs.berkeley.edu Mon Feb 7 16:36:09 2005 From: davidopp at cs.berkeley.edu (David L. Oppenheimer) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] gossiping in a DHT In-Reply-To: Message-ID: <200502071635.IAA27418@mindbender.davido.com> You might want to take a look at Kelips http://citeseer.ist.psu.edu/570786.html David > Hi All, I have been thinking about developing a gossip information > dissemenation algorithm to work across a DHT. Does any one have any > links to any must read papers on this topic? Conceptually, the process > seems similar to that of gossip in an unstructured DHT. Just wondering > if there was any prior work I should take a look at thanks. :) From paul at ref.nmedia.net Tue Feb 8 13:31:56 2005 From: paul at ref.nmedia.net (Paul Campbell) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] gossiping in a DHT In-Reply-To: References: Message-ID: <20050208133156.GA11916@ref.nmedia.net> On Mon, Feb 07, 2005 at 07:22:30PM +1100, John Casey wrote: > Hi All, I have been thinking about developing a gossip information > dissemenation algorithm to work across a DHT. Does any one have any > links to any must read papers on this topic? Conceptually, the process > seems similar to that of gossip in an unstructured DHT. Just wondering > if there was any prior work I should take a look at thanks. :) Gossipping has to overcome the unstructured nature of the underlying network. In a DHT, this is not necessary since it is easy to set up a real broadcast. Look for protocols dealing with broadcasting on a DHT. For instance, one could propagate a message around the ring until it gets back to the source. This would take N-1 messages (if the originator is listed in the message) and N-1 rounds. A faster way is to use the DHT structure where some nodes broadcast multiple messages. For instance, the source could conceptually break the DHT ring up into arcs and broadcast a message to a node residing on each arc along with the arc length. In turn, the next layer of nodes can broadcast the message across their respective arcs, subdividing the problem by another level. With log(N) known neighbors, it should take log(N) rounds to reach every node and again, N-1 messages. Contrast this with N*log(N) messages in an unstructured gossipping system with log(N) rounds. Thus, without structure, the load is much higher. From anwitaman at hotmail.com Wed Feb 9 12:00:44 2005 From: anwitaman at hotmail.com (Anwitaman Datta) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] RE: p2p-hackers Digest, Vol 19, Issue 7 In-Reply-To: <20050208200004.84AD13FD25@capsicum.zgp.org> Message-ID: There are several DHT based broadcasting mechanisms in the literature, which may also interest you. The first that I came across was "structella": http://nms.lcs.mit.edu/HotNets-II/papers/structella.pdf Also, we use such a scheme for range queries in P-Grid: http://www.p-grid.org/Papers/TR-IC-2004-111.pdf as also used in prefix hash tree http://berkeley.intel-research.net/sylvia/pht.pdf - Anwitaman Today's Topics: 1. Re: gossiping in a DHT (Paul Campbell) A faster way is to use the DHT structure where some nodes broadcast multiple messages. For instance, the source could conceptually break the DHT ring up into arcs and broadcast a message to a node residing on each arc along with the arc length. In turn, the next layer of nodes can broadcast the message across their respective arcs, subdividing the problem by another level. With log(N) known neighbors, it should take log(N) rounds to reach every node and again, N-1 messages. Contrast this with N*log(N) messages in an unstructured gossipping system with log(N) rounds. Thus, without structure, the load is much higher. _________________________________________________________________ Trailblazer Narain Karthikeyan. Know more about him ‘n his life. http://server1.msn.co.in/sp04/tataracing/ Stay in the loop with Tata Racing! From john.casey at gmail.com Thu Feb 10 04:40:09 2005 From: john.casey at gmail.com (John Casey) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] gossiping in a DHT In-Reply-To: <20050208133156.GA11916@ref.nmedia.net> References: <20050208133156.GA11916@ref.nmedia.net> Message-ID: thanks guys. I've just been reading digesting the papers you have given me. The structella, and the pointers to the broadcasting papers it have are very useful :) On Tue, 8 Feb 2005 05:31:56 -0800, Paul Campbell wrote: > On Mon, Feb 07, 2005 at 07:22:30PM +1100, John Casey wrote: > > Hi All, I have been thinking about developing a gossip information > > dissemenation algorithm to work across a DHT. Does any one have any > > links to any must read papers on this topic? Conceptually, the process > > seems similar to that of gossip in an unstructured DHT. Just wondering > > if there was any prior work I should take a look at thanks. :) > > Gossipping has to overcome the unstructured nature of the underlying > network. In a DHT, this is not necessary since it is easy to set up a > real broadcast. Look for protocols dealing with broadcasting on a DHT. > > For instance, one could propagate a message around the ring until it gets > back to the source. This would take N-1 messages (if the originator is > listed in the message) and N-1 rounds. > > A faster way is to use the DHT structure where some nodes broadcast multiple > messages. For instance, the source could conceptually break the DHT ring up > into arcs and broadcast a message to a node residing on each arc along with > the arc length. In turn, the next layer of nodes can broadcast the message > across their respective arcs, subdividing the problem by another level. With > log(N) known neighbors, it should take log(N) rounds to reach every node and > again, N-1 messages. Contrast this with N*log(N) messages in an unstructured > gossipping system with log(N) rounds. Thus, without structure, the load is > much higher. From rabbi at abditum.com Thu Feb 10 08:01:01 2005 From: rabbi at abditum.com (Len Sassaman) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] CodeCon Reminder Message-ID: We'd like to remind those of you planning to attend this year's event that CodeCon is fast approaching. CodeCon is the premier event in 2005 for application developer community. It is a workshop for developers of real-world applications with working code and active development projects. Past presentations at CodeCon have included the file distribution software BitTorrent; the Peek-A-Booty anti-censorship application; the email encryption system PGP Universal; and Audacity, a powerful audio editing tool. Some of this year's highlights include Off-The-Record Messaging, a privacy-enhancing encryption protocol for instant-message systems; SciTools, a web-based toolkit for genetic design and analysis; and Incoherence, a novel stereo sound visualization tool. CodeCon registration is discounted this year: $80 for cash at the door registrations. Registration will be available every day of the conference, though ticket are limited, and attendees are encouraged to register on the first day to secure admission. CodeCon will be held February 11-13, noon-6pm, at Club NV (525 Howard Street) in San Francisco. For more information, please visit http://www.codecon.org. From aloeser at cs.tu-berlin.de Thu Feb 10 08:57:00 2005 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] gossiping in a DHT References: <20050208133156.GA11916@ref.nmedia.net> Message-ID: <420B21DC.4A2E1625@cs.tu-berlin.de> Hi John, probably you should look at the hypercup topology [1] , that permits broadcasting in a structured overlay. In Edutella [2] we use the broadcast mechanism to broadcast complex queries. Due to the combination of the broadcast with routing indices and a super-peer network we are able to focus the broadcast to a subset of peers. Alex [1] http://projekte.learninglab.uni-hannover.de/pub/bscw.cgi/d7825/HyperCuP%20-%20Hypercubes,%20Ontologies%20and%20Efficient%20Search%20on%20P2P%20Networks [2] http://www.kbs.uni-hannover.de/Arbeiten/Publikationen/2002/www2003_superpeer.pdf John Casey wrote: > thanks guys. I've just been reading digesting the papers you have > given me. The structella, and the pointers to the broadcasting papers > it have are very useful :) > > On Tue, 8 Feb 2005 05:31:56 -0800, Paul Campbell wrote: > > On Mon, Feb 07, 2005 at 07:22:30PM +1100, John Casey wrote: > > > Hi All, I have been thinking about developing a gossip information > > > dissemenation algorithm to work across a DHT. Does any one have any > > > links to any must read papers on this topic? Conceptually, the process > > > seems similar to that of gossip in an unstructured DHT. Just wondering > > > if there was any prior work I should take a look at thanks. :) > > > > Gossipping has to overcome the unstructured nature of the underlying > > network. In a DHT, this is not necessary since it is easy to set up a > > real broadcast. Look for protocols dealing with broadcasting on a DHT. > > > > For instance, one could propagate a message around the ring until it gets > > back to the source. This would take N-1 messages (if the originator is > > listed in the message) and N-1 rounds. > > > > A faster way is to use the DHT structure where some nodes broadcast multiple > > messages. For instance, the source could conceptually break the DHT ring up > > into arcs and broadcast a message to a node residing on each arc along with > > the arc length. In turn, the next layer of nodes can broadcast the message > > across their respective arcs, subdividing the problem by another level. With > > log(N) known neighbors, it should take log(N) rounds to reach every node and > > again, N-1 messages. Contrast this with N*log(N) messages in an unstructured > > gossipping system with log(N) rounds. Thus, without structure, the load is > > much higher. > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -- ___________________________________________________________ Alexander L?ser Technische Universit?t Berlin http://cis.cs.tu-berlin.de/~aloeser/ office : +49- 30-314-25551 fax : +49- 30-314-21601 ___________________________________________________________ From telecontrol at t-online.de Thu Feb 10 10:46:29 2005 From: telecontrol at t-online.de (Telecontrol) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] We need some help for our project TV-Sharing over P2P (www.cybertelly.com) Message-ID: <003001c50f5d$c71b56c0$69a2a8c0@namepc> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 12199 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20050210/fd77db14/attachment.gif From telecontrol at t-online.de Thu Feb 10 10:56:38 2005 From: telecontrol at t-online.de (Telecontrol) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] We need some help for our project TV-Sharing over P2P Message-ID: <003c01c50f5f$31c5e610$69a2a8c0@namepc> Please use the email adress telecontrol@t-online.de if you want to support the project , Thank you !! From sszukala at runbox.com Thu Feb 10 20:39:14 2005 From: sszukala at runbox.com (Shannon Alexander Szukala) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Re: p2p-hackers Digest, Vol 19, Issue 10 In-Reply-To: <20050210200004.046783FD65@capsicum.zgp.org> References: <20050210200004.046783FD65@capsicum.zgp.org> Message-ID: Hey I want to help out. Let me know what you are looking for. > Send p2p-hackers mailing list submissions to > p2p-hackers@zgp.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://zgp.org/mailman/listinfo/p2p-hackers > or, via email, send a message with subject or body 'help' to > p2p-hackers-request@zgp.org > > You can reach the person managing the list at > p2p-hackers-owner@zgp.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of p2p-hackers digest..." > > > Today's Topics: > > 1. We need some help for our project TV-Sharing over P2P > (Telecontrol) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 10 Feb 2005 11:56:38 +0100 > From: "Telecontrol" > Subject: [p2p-hackers] We need some help for our project TV-Sharing > over P2P > To: > Message-ID: <003c01c50f5f$31c5e610$69a2a8c0@namepc> > Content-Type: text/plain; charset="us-ascii" > > Please use the email adress telecontrol@t-online.de if you want to > support the project , Thank you !! > > > > ------------------------------ > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > > > End of p2p-hackers Digest, Vol 19, Issue 10 > ******************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20050210/8dda94c0/attachment.html From trep at cs.ucr.edu Fri Feb 11 21:11:14 2005 From: trep at cs.ucr.edu (Thomas Repantis) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Bloom Filters in Gnutella (Was: Re: Paradigma Question: DHT's or Small World?) In-Reply-To: References: <4201FDB1.6F607C0C@cs.tu-berlin.de> Message-ID: <20050211211114.GA673@angeldust.chaos> Hi Greg, interesting what you wrote, that Gnutella uses Bloom Filters. I thought that simple hash tables were exchanged. How are the Bloom Filters propagated? Just from every leaf to its ultrapeer? Or do ultrapeers also exchange Bloom Filters? Let me know if you have any pointers on this. I'm only aware of: http://rfc-gnutella.sourceforge.net/src/Ultrapeers_1.0.html and http://www.limewire.com/developer/query_routing/keyword%20routing.htm I've also done some work on Bloom Filters and their propagation (the first paper on: http://www.cs.ucr.edu/~trep/publications.html ) Cheers, Thomas On Thu, Feb 03, 2005 at 10:26:25AM -0500, Greg Bildson wrote: > I'd just like to point out that Gnutella does not use pure flooding anymore > and you are unlikely to find P2P networks that don't have something akin to > supernodes. Gnutella uses bloom filter based keyword index replication and > dynamic querying (selectively sending out queries until a result limit is > reached) to reduce the overhead of flooding for popular queries and to route > all queries on the last hop. > > Thanks > -greg > -- http://www.cs.ucr.edu/~trep -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20050211/e324ccd7/attachment.pgp From mgp at ucla.edu Tue Feb 15 09:52:41 2005 From: mgp at ucla.edu (Michael Parker) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Online Codes Message-ID: <4211C669.3080200@ucla.edu> Hi all, Does anyone know what happened to the "Online Codes" Sourceforge project, listed at http://sourceforge.net/projects/onlinecodes? I'm asking here for two reasons: First, because Online Codes [1, 2] would be a great tool in peer-to-peer applications, so I thought someone here might have followed the project while it was still active. Second, I've written a solid library implementation of the Online Codes encoding/decoding algorithm described in the aforementioned papers. Alas, only after I implemented it did I find out that the authors' company, Rateless, had patented it (or, so they allude to on their web site www.rateless.com, Digital Fountain owned the IP). I was thinking of releasing it under the GPL, but now that I've discovered patents are involved that seems like a very bad idea. So I was wondering if the Online Codes project broke up because of this, and whether I would get sued into oblivion if I ever made this code available? IANAL, but is it illegal to write such code and distribute it as a library on the net (after all, it is straight from their papers) to elucidate how the algorithm works, or only illegal to include the library in any working software program? Regards, Michael Parker [1] http://www.rateless.com/oncodes.pdf [2] http://www.rateless.com/msd.ps From stewbagz at gmail.com Tue Feb 15 10:00:42 2005 From: stewbagz at gmail.com (stew "stewbagz" mercer) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Online Codes In-Reply-To: <4211C669.3080200@ucla.edu> References: <4211C669.3080200@ucla.edu> Message-ID: <3b462676050215020043ee0d5a@mail.gmail.com> I was wondering about this as well. It appears that there was a build of rateless-copy and rateless-tunnel that was done with the cygwin tool kit, and that appears to have caused some complications. if you go to http://www.rateless.com/download_copy.html you can see the links to the binaries, but I've not been able to download anything from it. They were supposedly writing some RFCs for it too, but there is no sign of them either ... On Tue, 15 Feb 2005 01:52:41 -0800, Michael Parker wrote: > Hi all, > > Does anyone know what happened to the "Online Codes" Sourceforge > project, listed at http://sourceforge.net/projects/onlinecodes? I'm > asking here for two reasons: First, because Online Codes [1, 2] would be > a great tool in peer-to-peer applications, so I thought someone here > might have followed the project while it was still active. Second, I've > written a solid library implementation of the Online Codes > encoding/decoding algorithm described in the aforementioned papers. > Alas, only after I implemented it did I find out that the authors' > company, Rateless, had patented it (or, so they allude to on their web > site www.rateless.com, Digital Fountain owned the IP). I was thinking of > releasing it under the GPL, but now that I've discovered patents are > involved that seems like a very bad idea. So I was wondering if the > Online Codes project broke up because of this, and whether I would get > sued into oblivion if I ever made this code available? IANAL, but is it > illegal to write such code and distribute it as a library on the net > (after all, it is straight from their papers) to elucidate how the > algorithm works, or only illegal to include the library in any working > software program? > > Regards, > Michael Parker > > [1] http://www.rateless.com/oncodes.pdf > [2] http://www.rateless.com/msd.ps > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > From solipsis at pitrou.net Tue Feb 15 10:15:04 2005 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Online Codes In-Reply-To: <4211C669.3080200@ucla.edu> References: <4211C669.3080200@ucla.edu> Message-ID: <1108462504.7938.25.camel@p-dhcp-333-72.rd.francetelecom.fr> > I was thinking of > releasing it under the GPL, but now that I've discovered patents are > involved that seems like a very bad idea. So I was wondering if the > Online Codes project broke up because of this, and whether I would get > sued into oblivion if I ever made this code available? IANAL, but is it > illegal to write such code and distribute it as a library on the net > (after all, it is straight from their papers) to elucidate how the > algorithm works, or only illegal to include the library in any working > software program? If you are European then it's still legal ;) (given your e-mail address I guess you are not...) On the other hand, if software patents are valid in your country, then you can't distribute any code that infringes the patent without a license for that patent, even if you are doing it for research purposes, etc. Indeed, one of the problems with patents is that they are not subject to the traditional limits of copyright (fair use, etc.). Regards Antoine. -- http://solipsis.netofpeers.net/ From paul at ref.nmedia.net Tue Feb 15 19:58:23 2005 From: paul at ref.nmedia.net (Paul Campbell) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] Online Codes In-Reply-To: <4211C669.3080200@ucla.edu> References: <4211C669.3080200@ucla.edu> Message-ID: <20050215195823.GB25409@ref.nmedia.net> On Tue, Feb 15, 2005 at 01:52:41AM -0800, Michael Parker wrote: > Does anyone know what happened to the "Online Codes" Sourceforge > project, listed at http://sourceforge.net/projects/onlinecodes? I'm > asking here for two reasons: First, because Online Codes [1, 2] would be > a great tool in peer-to-peer applications, so I thought someone here > might have followed the project while it was still active. Second, I've > written a solid library implementation of the Online Codes > encoding/decoding algorithm described in the aforementioned papers. > Alas, only after I implemented it did I find out that the authors' > company, Rateless, had patented it (or, so they allude to on their web > site www.rateless.com, Digital Fountain owned the IP). I was thinking of > releasing it under the GPL, but now that I've discovered patents are > involved that seems like a very bad idea. There are additional papers out there. There are essentially two implementations of the idea. First, there's the "LT Codes" and "Raptor Codes". Second, there's the "Online Codes". Both are very similar in a lot of ways. There are also some fundamental problems. See this one: http://citeseer.ist.psu.edu/695965.html I didn't know that Online codes have now been patented. However, if you consider the code, you've got essentially two pieces. First, there's the LDPC cipher being used in erasure-handling only. Second, there's the inner error correction cipher. The inner cipher is what makes the fundamental difference between LT Codes and Online Codes. However, there is absolutely nothing to say that you can't use say a punctured rate-1 outer code (repitition-style codes) with a suitable scrambler, or vary the inner code with something that gives equivalent performance (even a BCH code). Patents only work as long as you implement ALL the features of the patent. From gojomo at bitzi.com Wed Feb 16 05:41:05 2005 From: gojomo at bitzi.com (Gordon Mohr (@ Bitzi)) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? Message-ID: <4212DCF1.1070909@bitzi.com> Via Slashdot, as reported by Bruce Schneier: http://www.schneier.com/blog/archives/2005/02/sha1_broken.html Schneier writes: # SHA-1 Broken # # SHA-1 has been broken. Not a reduced-round version. Not a # simplified version. The real thing. # # The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu # (mostly from Shandong University in China) have been quietly # circulating a paper announcing their results: # # * collisions in the the full SHA-1 in 2**69 hash operations, # much less than the brute-force attack of 2**80 operations # based on the hash length. # # * collisions in SHA-0 in 2**39 operations. # # * collisions in 58-round SHA-1 in 2**33 operations. # # This attack builds on previous attacks on SHA-0 and SHA-1, and # is a major, major cryptanalytic result. It pretty much puts a # bullet into SHA-1 as a hash function for digital signatures # (although it doesn't affect applications such as HMAC where # collisions aren't important). # # The paper isn't generally available yet. At this point I can't # tell if the attack is real, but the paper looks good and this # is a reputable research team. # # More details when I have them. - Gordon @ Bitzi From jeffh at cs.rice.edu Wed Feb 16 06:51:45 2005 From: jeffh at cs.rice.edu (Jeff Hoye) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <4212DCF1.1070909@bitzi.com> References: <4212DCF1.1070909@bitzi.com> Message-ID: <4212ED81.4030606@cs.rice.edu> Let's wait for a real report. But it's cool if it's true. -Jeff Gordon Mohr (@ Bitzi) wrote: > Via Slashdot, as reported by Bruce Schneier: > > http://www.schneier.com/blog/archives/2005/02/sha1_broken.html > > Schneier writes: > > # SHA-1 Broken > # > # SHA-1 has been broken. Not a reduced-round version. Not a > # simplified version. The real thing. > # > # The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu > # (mostly from Shandong University in China) have been quietly > # circulating a paper announcing their results: > # > # * collisions in the the full SHA-1 in 2**69 hash operations, > # much less than the brute-force attack of 2**80 operations > # based on the hash length. > # > # * collisions in SHA-0 in 2**39 operations. > # > # * collisions in 58-round SHA-1 in 2**33 operations. > # > # This attack builds on previous attacks on SHA-0 and SHA-1, and > # is a major, major cryptanalytic result. It pretty much puts a > # bullet into SHA-1 as a hash function for digital signatures > # (although it doesn't affect applications such as HMAC where > # collisions aren't important). > # > # The paper isn't generally available yet. At this point I can't > # tell if the attack is real, but the paper looks good and this > # is a reputable research team. > # > # More details when I have them. > > - Gordon @ Bitzi > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From osokin at osokin.com Wed Feb 16 08:11:07 2005 From: osokin at osokin.com (Serguei Osokine) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <4212DCF1.1070909@bitzi.com> Message-ID: > # * collisions in the the full SHA-1 in 2**69 hash operations, > # much less than the brute-force attack of 2**80 operations... Okay, so the effective SHA-1 length is 138 bits instead of full 160 - so what's the big deal? It is still way more than, say, MD5 length. And MD5 is still widely used for stuff like content id'ing in various systems, because even 128 bits is quite a lot, never mind 138 bits. Best wishes - S.Osokine. 16 Feb 2005. -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On Behalf Of Gordon Mohr (@ Bitzi) Sent: Tuesday, February 15, 2005 9:41 PM To: p2p-hackers Subject: [p2p-hackers] SHA1 broken? Via Slashdot, as reported by Bruce Schneier: http://www.schneier.com/blog/archives/2005/02/sha1_broken.html Schneier writes: # SHA-1 Broken # # SHA-1 has been broken. Not a reduced-round version. Not a # simplified version. The real thing. # # The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu # (mostly from Shandong University in China) have been quietly # circulating a paper announcing their results: # # * collisions in the the full SHA-1 in 2**69 hash operations, # much less than the brute-force attack of 2**80 operations # based on the hash length. # # * collisions in SHA-0 in 2**39 operations. # # * collisions in 58-round SHA-1 in 2**33 operations. # # This attack builds on previous attacks on SHA-0 and SHA-1, and # is a major, major cryptanalytic result. It pretty much puts a # bullet into SHA-1 as a hash function for digital signatures # (although it doesn't affect applications such as HMAC where # collisions aren't important). # # The paper isn't generally available yet. At this point I can't # tell if the attack is real, but the paper looks good and this # is a reputable research team. # # More details when I have them. - Gordon @ Bitzi _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From gojomo at bitzi.com Wed Feb 16 09:10:13 2005 From: gojomo at bitzi.com (Gordon Mohr (@ Bitzi)) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: References: Message-ID: <42130DF5.3020708@bitzi.com> Serguei Osokine wrote: >># * collisions in the the full SHA-1 in 2**69 hash operations, >># much less than the brute-force attack of 2**80 operations... > > > Okay, so the effective SHA-1 length is 138 bits instead of full > 160 - so what's the big deal? If the results hold up: SHA1 is not as strong as it was designed to be, and its effective strength is being sent in the wrong direction, rather than being confirmed, by new research. Even while maintaining that SHA1 was unbroken and likely to remain so just last week, NIST was still recommending that SHA1 be phased out of government use by 2010: http://www.fcw.com/fcw/articles/2005/0207/web-hash-02-07-05.asp One more paper from a group of precocious researchers anywhere in the world, or unpublished result exploited in secret, could topple SHA1 from practical use entirely. Of course, that's remotely possible with any hash, but the pattern of recent results suggest that a further break is now more likely with SHA1 (and related hashes) than others. So the big deal would be: don't rely on SHA1 in any applications you intend to have a long effective life. > It is still way more than, say, MD5 > length. And MD5 is still widely used for stuff like content id'ing > in various systems, because even 128 bits is quite a lot, never > mind 138 bits. Just because it's widely used doesn't mean it's a good idea. MD5 should not be used for content identification, given the ability to create content pairs with the same MD5, with one version being (and appearing and acquiring a reputation for being) innocuous, and the other version malicious. - Gordon @ Bitzi From paul at ref.nmedia.net Wed Feb 16 13:15:36 2005 From: paul at ref.nmedia.net (Paul Campbell) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <4212DCF1.1070909@bitzi.com> References: <4212DCF1.1070909@bitzi.com> Message-ID: <20050216131536.GA27730@ref.nmedia.net> On Tue, Feb 15, 2005 at 09:41:05PM -0800, Gordon Mohr (@ Bitzi) wrote: > Via Slashdot, as reported by Bruce Schneier: > > http://www.schneier.com/blog/archives/2005/02/sha1_broken.html > > Schneier writes: > > # SHA-1 Broken I saw this a few months ago. It's not just SHA-1. All ciphers based on the MD-5 S-box design are apparently vulnerable. At this point, it appears that there are two options for the future: 1. Go to something with a larger internal state (256-bit state), and that is NOT just an extended version of the original (as the extended SHA standards attempt to do). 2. Go to a completely different type of cipher. The choices right now are either digital signatures via elliptic curves, or else using one of the stream cipher designs. Since neither one is really optimized for hashing-type operations, they are essentially no-go's for most P2P uses (e.g. DHT's). When I say "optimized", by that I mean very SLOW by the way. From ap at hamachi.cc Wed Feb 16 16:03:47 2005 From: ap at hamachi.cc (Alex Pankratov) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <20050216131536.GA27730@ref.nmedia.net> References: <4212DCF1.1070909@bitzi.com> <20050216131536.GA27730@ref.nmedia.net> Message-ID: <42136EE3.4000001@hamachi.cc> Paul Campbell wrote: > 2. Go to a completely different type of cipher. The choices right now are > either digital signatures via elliptic curves, ... By the way - is ECC patented ? I heard Sun had some activity around ECC patents, Certicom has patents for a curve selection algorithms, but is core ECC patented ? Or rather - is it in public domain or not ? I am seriously considering ECDSA as a replacement for RSA as it seems to be significantly faster for the same crypto strength. From Serguei.Osokine at efi.com Wed Feb 16 16:37:31 2005 From: Serguei.Osokine at efi.com (Serguei Osokine) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? Message-ID: <4A60C83D027E224BAA4550FB1A2B120E0DC35B@fcexmb04.efi.internal> On Wednesday, February 16, 2005 Gordon Mohr wrote: > MD5 should not be used for content identification, given the > ability to create content pairs with the same MD5, with one > version being (and appearing and acquiring a reputation for > being) innocuous, and the other version malicious. Right. So let's go and try to find something with the same MD5 as this letter of mine, shall we? :-) For any practical purpose that I can imagine in a content identification field, MD5 is just fine. And SHA-1 is even more fine. There are plenty more simple ways to attack the CDN nets than MD5 collisions. Way more simple. And abandoning MD5 for SHA1, then SHA1 for Tiger, and then abandoning Tiger for some newer hash when some researcher finds that it is really twenty bits weaker than you thought - it is all just a huge waste of development effort, as far as I'm concerned. It sure is nice to know that the human mind can find collisions in a 160-bit hash, but I have a feeling that the practical meaning of this result in the content identification area is precisely zero. Probably the biggest effect will be that the more advanced of the marketing types will start saying with a knowing look: "ah, but SHA1 was compromised - shouldn't we use something more secure?" Which is a plenty effect by itself, I'll grant you that. It will be way easier to switch to a newer hash than to explain to these guys that this is all a load of bull. But this is a Chicken Little effect, which is of a psychological rather than of a technical nature, and I'd expect to find the concerns about SHA1 weakness on some marketing forum rather than here. (All of the above is only about the content identification in the P2P nets, of course. Security/authentication is a different story. But saying that MD5 should not be used for the content identification does seem like a bit of an overstatement to me. I mean, imagine yourself a Gnutella network - so its biggest, major, noticeable, or even existing concern is a collision in the content hashes? Are you kidding? :-) Best wishes - S.Osokine. 16 Feb 2005. -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On Behalf Of Gordon Mohr (@ Bitzi) Sent: Wednesday, February 16, 2005 1:10 AM To: Peer-to-peer development. Subject: Re: [p2p-hackers] SHA1 broken? Serguei Osokine wrote: >># * collisions in the the full SHA-1 in 2**69 hash operations, >># much less than the brute-force attack of 2**80 operations... > > > Okay, so the effective SHA-1 length is 138 bits instead of full > 160 - so what's the big deal? If the results hold up: SHA1 is not as strong as it was designed to be, and its effective strength is being sent in the wrong direction, rather than being confirmed, by new research. Even while maintaining that SHA1 was unbroken and likely to remain so just last week, NIST was still recommending that SHA1 be phased out of government use by 2010: http://www.fcw.com/fcw/articles/2005/0207/web-hash-02-07-05.asp One more paper from a group of precocious researchers anywhere in the world, or unpublished result exploited in secret, could topple SHA1 from practical use entirely. Of course, that's remotely possible with any hash, but the pattern of recent results suggest that a further break is now more likely with SHA1 (and related hashes) than others. So the big deal would be: don't rely on SHA1 in any applications you intend to have a long effective life. > It is still way more than, say, MD5 > length. And MD5 is still widely used for stuff like content id'ing > in various systems, because even 128 bits is quite a lot, never > mind 138 bits. Just because it's widely used doesn't mean it's a good idea. MD5 should not be used for content identification, given the ability to create content pairs with the same MD5, with one version being (and appearing and acquiring a reputation for being) innocuous, and the other version malicious. - Gordon @ Bitzi _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From lloyd at randombit.net Wed Feb 16 22:05:17 2005 From: lloyd at randombit.net (Jack Lloyd) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <20050216131536.GA27730@ref.nmedia.net> References: <4212DCF1.1070909@bitzi.com> <20050216131536.GA27730@ref.nmedia.net> Message-ID: <20050216220516.GC29536@randombit.net> On Wed, Feb 16, 2005 at 05:15:36AM -0800, Paul Campbell wrote: > On Tue, Feb 15, 2005 at 09:41:05PM -0800, Gordon Mohr (@ Bitzi) wrote: > > Via Slashdot, as reported by Bruce Schneier: > > > > http://www.schneier.com/blog/archives/2005/02/sha1_broken.html > > > > Schneier writes: > > > > # SHA-1 Broken > > I saw this a few months ago. It's not just SHA-1. All ciphers based on the > MD-5 S-box design are apparently vulnerable. At this point, it appears that > there are two options for the future: No, there were no major results against full 80 round SHA-1 until this. There were collisions with ~50 of the 80 rounds for SHA-1, and Joux found a collision for SHA-0 around the same time Wang et all produced the collisions for MD4/MD5/RIPEMD/HAVAL-128 last summer. BTW, MD5 does not use S-Boxes in any form. > 1. Go to something with a larger internal state (256-bit state), and that is > NOT just an extended version of the original (as the extended SHA standards > attempt to do). Currently Whirlpool is looking like the best bet. Tiger is still out there, and is both reasonably fast on 32-bit machines and very fast on 64-bit, but it never saw much analysis, as the designers expected the 64-bit revolution about 8 years too early. Both are quite unlike the MDx designs, which is both good (possibly less likely to fall to whatever methods Wang and crew have), and bad (less analysis has been done). A major issue is that currently the details of the attacks haven't been published. All we really have right now are a set of collisions for various hashes, which proves that there are weaknesses, but until we know the details there is no way to say that they will or won't apply to Whirlpool/Tiger/SHA-2/etc. Fortunately the 2^69 worklimit on SHA-1 is currently theoretical for everyone but the TLAs, so the paper will have to explain the attack is sufficient detail to verify the results, from which people more compentent than me can see if the attacks do (or might) apply to the latest generation of hash functions. The real key is not just to upgrade, but to provide a smooth upgrade path in the future. Before SHA-1, the average security lifetime of a hash was about 5 years. I suspect we're seeing a return to that level of cycling; for the most part analysis of hash functions is not nearly as developed as that for block ciphers. > > 2. Go to a completely different type of cipher. The choices right now are > either digital signatures via elliptic curves, or else using one of the ECDSA and ECNR still use conventional hash functions; you don't reduce the impact of an attack on SHA-1 by using either of those as compared to DSA or RSA. > stream cipher designs. I am not aware of any methods of hashing with just a stream cipher; are you refering to Panama? Panama's stream cipher mode is still secure AFAIK, but the Panama transform has been shown insecure for hashing (IIRC with 2^80 operations, versus the expected 2^128) Regards, Jack From gojomo at bitzi.com Thu Feb 17 04:12:18 2005 From: gojomo at bitzi.com (Gordon Mohr (@ Bitzi)) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <4A60C83D027E224BAA4550FB1A2B120E0DC35B@fcexmb04.efi.internal> References: <4A60C83D027E224BAA4550FB1A2B120E0DC35B@fcexmb04.efi.internal> Message-ID: <421419A2.80307@bitzi.com> Serguei Osokine wrote: > On Wednesday, February 16, 2005 Gordon Mohr wrote: > >>MD5 should not be used for content identification, given the >>ability to create content pairs with the same MD5, with one >>version being (and appearing and acquiring a reputation for >>being) innocuous, and the other version malicious. > > > Right. So let's go and try to find something with the same > MD5 as this letter of mine, shall we? :-) I can't -- but you could have made a collision, very easily, if you composed your initial message with the intent of also composing an MD5 twin at the same time. That means for content identification MD5 is fatally flawed. For any file whose contents I think I know and trust, perhaps based on analysis and history of the file, there could be another dangerous file with the same MD5. MD5 cannot be used to distinguish between the two, but that's the whole point of using a secure hash for content identification. Dan Kaminsky runs over a number of potential attacks that are relevant to P2P -- see: http://paketto.doxpara.com Don't be fooled by the title of his analysis, "MD to be considered harmful someday" -- the attacks mentioned are possible now, and could trick people and software in subtle ways different from other threats to P2P nets. Here's another example from the cryptography list that convinced a doubter that the attacks on MD5 were of more than purely theoretical interest: two long binary strings, one a prime number, one not: http://lists.virus.org/cryptography-0412/msg00102.html Consider source code or executables which work fine with the primes, s-boxes, and other initialization vectors initially examined -- but have exploitable flaws when those values are perturbed in a manner that leaves the MD5 the same. You need to use a different, stronger content check to prevent such mischief -- making the use of MD5 redundant and even dangerous for the false sense of security it gives. > For any practical purpose that I can imagine in a content > identification field, MD5 is just fine. And SHA-1 is even more > fine. If you can't imagine exploits, perhaps it's just a failure of your imagination. Prudent engineering would assume some attackers have better imaginations than you, when it comes to exploiting hashes that don't work as originally intended. > There are plenty more simple ways to attack the CDN nets > than MD5 collisions. Way more simple. And abandoning MD5 for > SHA1, then SHA1 for Tiger, and then abandoning Tiger for some > newer hash when some researcher finds that it is really twenty > bits weaker than you thought - it is all just a huge waste of > development effort, as far as I'm concerned. Depends on the kinds of attacks you're worried about. There are more simple ways to disrupt P2P nets, sure. But are there more simple ways to trick conscientious, hash-checking users into running malware? And since when did the ease of other attacks become an excuse for ignoring more complicated and subtle (and thus perhaps more valuable) attacks? If you need a secure hash's properties in your software, you should use an uncompromised secure hash. (Results as early as 1996 suggested MD5 should not be used in applications where collision-resistance is important.) If you're stuck with a legacy hash, fine, analyze the situation and if you're confident the weakness has no effect on current usage, rationalize using it a while longer. But get ready for the potential need to switch hashes quickly in the presence of further discoveries. Or better yet: design with the idea in mind that no hash function lives forever. - Gordon @ Bitzi From osokin at osokin.com Thu Feb 17 07:37:55 2005 From: osokin at osokin.com (Serguei Osokine) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: <421419A2.80307@bitzi.com> Message-ID: On Wednesday, February 16, 2005 Gordon Mohr wrote: > Dan Kaminsky runs over a number of potential attacks that > are relevant to P2P -- see: > > http://paketto.doxpara.com > ... > Here's another example from the cryptography list that convinced > a doubter... Certainly looks cute. Now correct me if I'm not getting something here - but isn't it true that in order to mount an attack one has to replace the "good" code (content, whatever) by the "bad" code, and the absolutely necessary condition is that the "good" code also has to be created by an attacker? So an attacker creates "good" code, gives it to security experts for verification, and then after they are done, replaces it with "bad code", right? Isn't it a bit far-fetched? Do we have a somewhat more realistic attack scenario? I just cannot imagine all this happening in real life. Real-life breakdowns always tend to be way simpler than their theoretical scenarios (and totally unexpected, too). > But are there more simple ways to trick conscientious, hash-checking > users into running malware? Users typically don't give a damn about hash-checking; they expect the system to do that for them. And a few users that do give a damn typically can defend themselves from pretty much anything no matter what you throw at them. So the fate of this "expert" group (consisting of about ten people for any given P2P system) does not really worry me, whereas for the rest of the user population there are plenty of ways to trick them into running the malware - *all* the current ways of doing so are simpler than fiddling with hashes. Which brings me back to my question above: do we have a realistic scenario where a network like Gnutella would be harmed by using MD5? (Not that I give a damn about MD5, and no one in Gnutella probably uses it anyway; my interest is largely theoretical here, and the same issues might be relevant for the other hashes, either.) > And since when did the ease of other attacks become an excuse > for ignoring more complicated and subtle (and thus perhaps > more valuable) attacks? Why, every time you do not have infinite development resources, of course. You always have to juggle priorities, and subtle attacks typically are not anywhere close to the head of the development priority list for P2P networks... > Or better yet: design with the idea in mind that no hash function > lives forever. Sure; but that's orthogonal: > If you're stuck with a legacy hash, fine, analyze the situation > and if you're confident the weakness has no effect on current > usage, rationalize using it a while longer. My point exactly. The issue is whether one should consider the deployed legacy codebase unsecure after every new discovery is made in the hash collision research or not. My personal approach would be to disregard the possible collision issues until there is a problem serious enough to be noticed by CNN. (So far I still cannot see any *realistic* attack scenario; maybe your next letter will convince me that I'm wrong :-) But everyone has a personal "worry threshold", I guess. Mine is pretty low... Best wishes - S.Osokine. 16 Feb 2005. -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On Behalf Of Gordon Mohr (@ Bitzi) Sent: Wednesday, February 16, 2005 8:12 PM To: Peer-to-peer development. Subject: Re: [p2p-hackers] SHA1 broken? Serguei Osokine wrote: > On Wednesday, February 16, 2005 Gordon Mohr wrote: > >>MD5 should not be used for content identification, given the >>ability to create content pairs with the same MD5, with one >>version being (and appearing and acquiring a reputation for >>being) innocuous, and the other version malicious. > > > Right. So let's go and try to find something with the same > MD5 as this letter of mine, shall we? :-) I can't -- but you could have made a collision, very easily, if you composed your initial message with the intent of also composing an MD5 twin at the same time. That means for content identification MD5 is fatally flawed. For any file whose contents I think I know and trust, perhaps based on analysis and history of the file, there could be another dangerous file with the same MD5. MD5 cannot be used to distinguish between the two, but that's the whole point of using a secure hash for content identification. Dan Kaminsky runs over a number of potential attacks that are relevant to P2P -- see: http://paketto.doxpara.com Don't be fooled by the title of his analysis, "MD to be considered harmful someday" -- the attacks mentioned are possible now, and could trick people and software in subtle ways different from other threats to P2P nets. Here's another example from the cryptography list that convinced a doubter that the attacks on MD5 were of more than purely theoretical interest: two long binary strings, one a prime number, one not: http://lists.virus.org/cryptography-0412/msg00102.html Consider source code or executables which work fine with the primes, s-boxes, and other initialization vectors initially examined -- but have exploitable flaws when those values are perturbed in a manner that leaves the MD5 the same. You need to use a different, stronger content check to prevent such mischief -- making the use of MD5 redundant and even dangerous for the false sense of security it gives. > For any practical purpose that I can imagine in a content > identification field, MD5 is just fine. And SHA-1 is even more > fine. If you can't imagine exploits, perhaps it's just a failure of your imagination. Prudent engineering would assume some attackers have better imaginations than you, when it comes to exploiting hashes that don't work as originally intended. > There are plenty more simple ways to attack the CDN nets > than MD5 collisions. Way more simple. And abandoning MD5 for > SHA1, then SHA1 for Tiger, and then abandoning Tiger for some > newer hash when some researcher finds that it is really twenty > bits weaker than you thought - it is all just a huge waste of > development effort, as far as I'm concerned. Depends on the kinds of attacks you're worried about. There are more simple ways to disrupt P2P nets, sure. But are there more simple ways to trick conscientious, hash-checking users into running malware? And since when did the ease of other attacks become an excuse for ignoring more complicated and subtle (and thus perhaps more valuable) attacks? If you need a secure hash's properties in your software, you should use an uncompromised secure hash. (Results as early as 1996 suggested MD5 should not be used in applications where collision-resistance is important.) If you're stuck with a legacy hash, fine, analyze the situation and if you're confident the weakness has no effect on current usage, rationalize using it a while longer. But get ready for the potential need to switch hashes quickly in the presence of further discoveries. Or better yet: design with the idea in mind that no hash function lives forever. - Gordon @ Bitzi _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From em at em.no-ip.com Thu Feb 17 11:11:13 2005 From: em at em.no-ip.com (Enzo Michelangeli) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? References: <4212DCF1.1070909@bitzi.com><20050216131536.GA27730@ref.nmedia.net> <42136EE3.4000001@hamachi.cc> Message-ID: <005b01c514e1$6dcee580$0200a8c0@em.noip.com> ----- Original Message ----- From: "Alex Pankratov" To: "Peer-to-peer development." Sent: Thursday, February 17, 2005 12:03 AM Subject: Re: [p2p-hackers] SHA1 broken? [...] > By the way - is ECC patented ? I heard Sun had some activity around > ECC patents, Certicom has patents for a curve selection algorithms, > but is core ECC patented ? Or rather - is it in public domain or not ? Answers to patent-related questions are not Turing computable ;-) Anyway, several years ago the IEEE made an effort to collect statements and claims about intellectual property on PK encryption algorithms: http://grouper.ieee.org/groups/1363/P1363/patents.html Several of the letters collected refer to EC-related areas (Nyberg-Rueppel signatures, point compression techniques, etc.) Enzo From gojomo at bitzi.com Thu Feb 17 18:23:51 2005 From: gojomo at bitzi.com (Gordon Mohr (@ Bitzi)) Date: Sat Dec 9 22:12:50 2006 Subject: [p2p-hackers] SHA1 broken? In-Reply-To: References: Message-ID: <4214E137.8000109@bitzi.com> Serguei Osokine wrote: > On Wednesday, February 16, 2005 Gordon Mohr wrote: > >>Dan Kaminsky runs over a number of potential attacks that >>are relevant to P2P -- see: >> >> http://paketto.doxpara.com >>... >>Here's another example from the cryptography list that convinced >>a doubter... > > > Certainly looks cute. Now correct me if I'm not getting something > here - but isn't it true that in order to mount an attack one has to > replace the "good" code (content, whatever) by the "bad" code, and the > absolutely necessary condition is that the "good" code also has to be > created by an attacker? So an attacker creates "good" code, gives it > to security experts for verification, and then after they are done, > replaces it with "bad code", right? Yes. > Isn't it a bit far-fetched? Do we have a somewhat more realistic > attack scenario? I just cannot imagine all this happening in real > life. Real-life breakdowns always tend to be way simpler than their > theoretical scenarios (and totally unexpected, too). It's possible. It's not that hard. It would offer rewards to an attacker that are different and possibly larger than those offered by the simple tricks that reel in easy marks. So it doesn't seem that far-fetched to me. >>But are there more simple ways to trick conscientious, hash-checking >>users into running malware? > > > Users typically don't give a damn about hash-checking; they > expect the system to do that for them. And a few users that do give > a damn typically can defend themselves from pretty much anything no > matter what you throw at them. So the fate of this "expert" group > (consisting of about ten people for any given P2P system) does not > really worry me, whereas for the rest of the user population there > are plenty of ways to trick them into running the malware - *all* > the current ways of doing so are simpler than fiddling with hashes. If your attack is just to get someone, somewhere to run your malware, sure. But the average/mass user is not the only interesting case. If you want to get onto other, higher-valued machines, you have to get around the real practices of many users, of various sophistication, who do care about hashes of received content matching expected values. For such people, to get them to settle for MD5, you either convince them not to worry about the potential attack -- making them potential victims -- or you lose them as users, because they realize that the hashes used for content-identification on your network do not offer the guarantee they seek. That's not a good result. I want P2P+CDN that delivers content that I and other sophisticated users can trust, and I want the unsophisticated users on the same network, too: I gain from their presence as peers/ seeds, and they can gain from my insistence on rigorous content identification. > Which brings me back to my question above: do we have a > realistic scenario where a network like Gnutella would be harmed by > using MD5? Having installers like the fire.exe/ice.exe described by Kaminsky, which have the same MD5 but install different software, could quickly undermine confidence in an MD5-only P2P network for most kinds of content delivery. Telling average users (or businesses considering P2P delivery), "but that's only when the attacker gets to create both files", is noise to them. (And for pro users, telling them that they have to trust the original creator of the file not to have created twins is tantamount to requiring the content to be separately digitally signed to prove origination -- an additional step rendering the plain standalone MD5 for content identification superfluous.) > (Not that I give a damn about MD5, and no one in Gnutella probably > uses it anyway; my interest is largely theoretical here, and the same > issues might be relevant for the other hashes, either.) > > >>And since when did the ease of other attacks become an excuse >>for ignoring more complicated and subtle (and thus perhaps >>more valuable) attacks? > > > Why, every time you do not have infinite development resources, > of course. You always have to juggle priorities, and subtle attacks > typically are not anywhere close to the head of the development > priority list for P2P networks... Of course work has to be prioritized in context. But the priority list is not a single-file line, where a few frontmost entries prevent consideration of everything else. In particular, I would guess the "head of the development priority list" for most commercial P2P networks is dominated by user satisfaction issues. But these are only remedied incrementally, with research and trial and error. The risk of delay is incremental competitive decay, and the work is never really "done". At the same time, developers can be addressing other specific flaws -- failures of the software and chosen algorithms to deliver the functionality intended. Such flaws can't be ignored forever. They may be easy to fix with a discrete amount of effort. And since transitioning hash functions requires lead time, the groundwork should be laid before any change is urgent. >>Or better yet: design with the idea in mind that no hash function >>lives forever. > > > Sure; but that's orthogonal: > > >>If you're stuck with a legacy hash, fine, analyze the situation >>and if you're confident the weakness has no effect on current >>usage, rationalize using it a while longer. > > > My point exactly. The issue is whether one should consider the > deployed legacy codebase unsecure after every new discovery is made > in the hash collision research or not. My personal approach would be > to disregard the possible collision issues until there is a problem > serious enough to be noticed by CNN. (So far I still cannot see any > *realistic* attack scenario; maybe your next letter will convince me > that I'm wrong :-) But everyone has a personal "worry threshold", > I guess. Mine is pretty low... I suppose it depends on how high your ambitions for P2P are. Clearly, you can have a very popular network with a very weak hash for quite a while -- witness ED2K, using MD4, a hash "broken" for over a decade. But over time, users have become more aware of the importance of hash-based content-verification, and users have generally migrated in the direction of more-rigorous hash-using networks -- though not to the *most* rigorous networks. If P2P is just a leisure-time lark for credulous, casual users who have many other unhygenic comuting practices, then you can be lacksadaisical in your use of hash algorithms. If you want it to also be a platform stable for long-term use by more discriminating users and commercial endeavors, you should take the strength of your hashes seriously. If you wait until someone is hurt enough that the damage is reported on CNN, that's too long. - Gordon @ Bitzi > Best wishes - > S.Osokine. > 16 Feb 2005. > > > -----Original Message----- > From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org]On > Behalf Of Gordon Mohr (@ Bitzi) > Sent: Wednesday, February 16, 2005 8:12 PM > To: Peer-to-peer development. > Subject: Re: [p2p-hackers] SHA1 broken? > > > Serguei Osokine wrote: > >>On Wednesday, February 16, 2005 Gordon Mohr wrote: >> >> >>>MD5 should not be used for content identification, given the >>>ability to create content pairs with the same MD5, with one >>>version being (and appearing and acquiring a reputation for >>>being) innocuous, and the other version malicious. >> >> >> Right. So let's go and try to find something with the same >>MD5 as this letter of mine, shall we? :-) > > > I can't -- but you could have made a collision, very easily, if > you composed your initial message with the intent of also composing > an MD5 twin at the same time. > > That means for content identification MD5 is fatally flawed. For > any file whose contents I think I know and trust, perhaps based > on analysis and history of the file, there could be another > dangerous file with the same MD5. MD5 cannot be used to distinguish > between the two, but that's the whole point of using a secure > hash for content identification. > > Dan Kaminsky runs over a number of potential attacks that > are relevant to P2P -- see: > > http://paketto.doxpara.com > > Don't be fooled by the title of his analysis, "MD to be considered > harmful someday" -- the attacks mentioned are possible now, and > could trick people and software in subtle ways different from > other threats to P2P nets. > > Here's another example from the cryptography list that convinced > a doubter that the attacks on MD5 were of more than purely > theoretical interest: two long binary strings, one a prime number, > one not: > > http://lists.virus.org/cryptography-0412/msg00102.html > > Consider source code or executables which work fine with the > primes, s-boxes, and other initialization vectors initially > examined -- but have exploitable flaws when those values are > perturbed in a manner that leaves the MD5 the same. You need > to use a different, stronger content check to prevent such > mischief -- making the use of MD5 redundant and even dangerous > for the false sense of security it gives. > > >> For any practical purpose that I can imagine in a content >>identification field, MD5 is just fine. And SHA-1 is even more >>fine. > > > If you can't imagine exploits, perhaps it's just a failure of > your imagination. Prudent engineering would assume some attackers > have better imaginations than you, when it comes to exploiting > hashes that don't work as originally intended. > > >>There are plenty more simple ways to attack the CDN nets >>than MD5 collisions. Way more simple. And abandoning MD5 for >>SHA1, then SHA1 for Tiger, and then abandoning Tiger for some >>newer hash when some researcher finds that it is really twenty >>bits weaker than you thought - it is all just a huge waste of >>development effort, as far as I'm concerned. > > > Depends on the kinds of attacks you're worried about. There > are more simple ways to disrupt P2P nets, sure. But are there > more simple ways to trick conscientious, hash-checking users > into running malware? > > And since when did the ease of other attacks become an excuse > for ignoring more complicated and subtle (and thus perhaps > more valuable) attacks? > > If you need a secure hash's properties in your software, you > should use an uncompromised secure hash. (Results as early as > 1996 suggested MD5 should not be used in applications where > collision-resistance is important.) > > If you're stuck with a legacy hash, fine, analyze the situation > and if you're confident the weakness has no effect on current > usage, rationalize using it a while longer. But get ready for > the potential need to switch hashes quickly in the presence of > further discoveries. Or better yet: design with the idea in mind > that no hash function lives forever. > > - Gordon @ Bitzi > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > > From Serguei.Osokine at efi.com Thu Feb 17 18:37:34 2005 From: Serguei.Osokine at efi.com (Serguei Osokine) Date: Sat Dec 9 22:12:50 2006 Subject: