[Noisebridge-discuss] Distributed computing and storage (with some major caveats)

Tue Jun 23 06:40:09 UTC 2009

in that case it sounds like in order to start building such a system
you would want to see what's already out there. i am all for a group
where we read papers and explore existing solutions to those problems.
my hope is that enough of us get to a point where we can start
augmenting and creating something new.

jabber: ian at slumbrparty.com
http://www.twitter.com/verbiee

On Mon, Jun 22, 2009 at 11:06 PM, Sai Emrys<noisebridge at saizai.com> wrote:
> On Mon, Jun 22, 2009 at 9:48 PM, Shannon Lee<shannon at scatter.com> wrote:
>> It seems interesting to me.  Does it have to be SQL-style, or could you do
>> some sort of key-value store thing?
>
> As I responded to Jason, I don't especially care if it's SQL per se,
> although it'd be convenient as an already known standard. I just want
> it to support certain features that simple pure key/value stores (eg
> memcache) do not - indexing, search by data field, etc. If it can do
> so by implementing a compliant subset of SQL such that it's compatible
> with SQL-based programs and just needs a different database interface
> library, so much the awesomer.
>
> For the most part, I expect the data being stored in this system to be
> structured in broadly the same way as a mysql or other RDBMS-type
> database is (i.e. fields, foreign keys, etc) and thus want to take
> advantage of that fact as much as possible. But certainly some things
> would need to be different - e.g. UUIDs throughout (since you'll have
> severe race conditions and propagation issues), indexing and other
> search as a sort of eDonkey-esque command (since no node will have an
> authoritative list), versioning (in case one node updates a record and
> another hasn't heard about that update yet), etc. Not to mention how
> you deal with version collisions or erstwhile atomic actions (e.g.
> counter increment); I think those are just not possible and have to
> have workarounds (e.g. a minimalist list of events instead of a
> counter thereof; maybe some sort of dated 'rollups' system if that
> gets to be too large). I think the advanced features (like JOIN)
> aren't especially necessary, at least not at first pass.
>
> It's also a bit unlike SQL in that you would most like receive the
> response as an ongoing data stream, rather than an atomic single
> answer, and you may want to work on data as it comes in. (Again, this
> is very similar to how e.g. eDonkey searches work AFAIU.)
>
> However, I think that each node could operate its own database as
> something SQL-compatible (mysql, sqlite, whatever) - I'm talking here
> about the interface to the amorphous system as a whole.
>
> On Mon, Jun 22, 2009 at 10:26 PM, d p chang<weasel at meer.net> wrote:
>> out of curiosity, is this feature set suggesting that the data come/go
>> w/ the nodes so that an action is local to the node (possibly
>> replicated)?
>
> I'm not sure I understand your question.
>
> All data should be replicated redundantly across nodes, so that it can
> survive any node going down without loss of the data that that node
> may have generated (or proxied).
>
> Action distribution is a tricky problem. For some things, it may be
> simply wasteful to do them multiple times; for others (such as, say, a
> search and stored classification of IP space) it may be actually
> harmful.
>
> On the other hand, to deal with node death and/or malice, you need to
> have multiple paths for any given task subset to get executed and
> verified. (For example, a simple divide-and-conquer strategy would be
> very vulnerable on these points, because the early nodes would control
> a great deal of the overall keyspace. So this would probably need to
> be structured as a graph, not a tree.)
>
> On Mon, Jun 22, 2009 at 10:51 PM, Ian<ian at slumbrparty.com> wrote:
>> this is something that i'm greatly interested in and have worked on
>> systems like parts of your requirements before. the thing is, what you
>> propose is very complicated to get right.
>
> Definitely. ;-)
>
>> i suggest you do some
>> research (if you havent already) on what's out there. there are
>> systems out there that already fulfill your requirements individually.
>> maybe you can combine them or augment existing systems. if you would
>> like, we can start a distributed systems/p2p/decentralized networks
>> group at NB so we can learn about these existing things.
>
> Part of the point of this email was to figure out what those systems
> are that I ought to read / rip. ;-)
>
> TBH, my knowledge of such systems is minimal and theoretical. I have a
> BA CogSci degree including various CS stuff, and I've got production
> experience with relatively large (~300k unique users / day) web apps
> using memcache & mysql. And I am superficially familiar with stuff
> like Metasploit, BOINC, & MapReduce. I'm sure that other NBers
> (perhaps you?) can easily outclass me in relevant knowledge.
>
> Anyhow, I'd be interested in such a group.
>
> - Sai
> _______________________________________________
> Noisebridge-discuss mailing list
> Noisebridge-discuss at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/noisebridge-discuss
>