Programming Blabs: Ideal Distributed File System

Monday, November 22, 2004

Ideal Distributed File System

At this day and age, we often access our data from a variety of devices, often from many different corners of the world. At the moment there is no easy way to synchronize that data across those devices. Here are the first two examples that come to mind.

1) I have MP3s stored on my computer and I also own an iPod. To transfer MP3s to iPod I need to connect my iPod to the computer using a wire, and then use some kind of application to transfer those MP3s. If I downloaded a few more MP3s and want them on the iPod, I need to repeat that process. Now say I left for a vacation to Europe, but forgot to add that awesome song to the iPod What do I do ? Well, there isn't much I can do.

2) I have an address book on my Gmail account, but I also have an address book in my cell phone. How do I synchronize the two? There isn't an easy way. Manual data entry is necessary.

Now consider something different. I'm using Gmail, but it only offers 1GB of storage. I'm also using a Photo Hosting Service, that only offers say 1GB of storage. I also have a web page, but my ISP only gives me 50MB for storage. See anything wrong with this picture? There are two problems that come my mind:

1) Inefficient utilization of space. If I'm only using 100MB of my GMail storage, but all 50MB for my web-page, 900MB that are reserved for me are essentially unused, and could be used for my web-page. I have to admit though that this isn't much of a problem, as space is becoming cheaper and cheaper, and I believe that within 5 years we'll completely stop worrying about space shortage.

2) There isn't an easy way to search for my data, since it's broken up across different vendors.

Now.. Here is my proposed solution the above problems.

All major players(MS, IBM, Google, Yahoo, etc) should come together and come up with a standard for a distributed file system. Vendors would then implement such a file system, and offer storage space to the users. The user can then instruct their PC, GMail, PhotoHosting Service, iPod, Cell Phone etc. to use that file system for storage(For example if a user uses Gmail, e-mails and other GMail generated data would be stored in the space that user bought from a Storage vendor).

Note that such storage space does not completely replace hard drive(or other type of local "slow" memory), as it's still necessary for bootstrapping and caching purposes. However; such service would allow me to install Windows, Linux etc, in my storage, so that I could then "boot up" my system with all my files from a library computer in Australia.

There are a number of serious issues with this approach, main of which is security, and obviously a constant connectivity to the storage vendor is necessary. However, I do think that this approach offers many benefits to the user, and hope that someday we will see something like this come into life.

3 Comments:

At November 22, 2004 at 8:30 PM, Alex Pilchin said...: Ok so this sounds like a worthy and great dream.
However lets consider this in logical steps.

First lets forget about distributed system for a second and suppose you have your computer at home connected to the internet, wich acts as a server and listens for messages over the web (from your devices). It also send HTTP responses in html or other appropriate format, which you can access from anywhere in the world through some kind of client (browser). Also suppose that this server has access to most of what you were describing (ie. email, mp3s, video, images, address-book, etc), which are all stored on your computer and it can relay this data to you. You can even assume that it provides some sort of HTML GUI that allows you to control the systems remotely. This is essentially what you are looking to accomplish.

Now the questions are
Q1) what are the limitations?
A1: Well your cellphone, ipod, etc do not support high speed internet access. Therefore this is why you can't retrieve the data. Hence before we even start thinking of the distributed file system that supports all of the devices in the world, we have to connect these devices to the internet with a fast connection!

If you had all these connected you can pretty much do the majority of what you've described via this theoretical, yet practicly realizable, system today without the need for a distributed file-system.

Other Limitations:
1) Network speed 2) Lack of various relavant resources on the device 3) Support for a consistant GUI on devices (ie support for HTML would on all devices and to be able to display it would be ideal). ...

Q2) Why not wait for a distributed file system?
A: Well for one thing, such a file system that is supported by all of the companies you've mentioned is not likely to come any time soon, unless somehow some company with its own file system will power into the market and offer something so unique and superior to others around that everyone will just start using and supporting it.

The reason it won't come soon is because it is not in the interest of many of the companies you've mentioned since it will directly comflict with some of their strategic product sales. But this is another story for another blog article.

Q3) What can you do at the moment to make this dream a reality today?

A: I think what I've described is realizable today and can be accomplished with some hard work, pending a solution to some of the limitation problems. Also you r server program can implement support for all kinds of API's (eg. Google Search, Gmail, etc) and perhaps start a movement of other internet-based storage companies (eg. the photo album people, bloging companies) being forced to implement an API/Webservice to grant access to their storage ... All this however may involve too much work though for one company to swallow.

What would be great about a distributed file-system on the web is that if it were implemented properly there will be no single sources of failure, your data would always be recoverable, you can always restore your settings, etc in the case of a virus, local hardware failures, etc.

Alex Pilchin
At November 22, 2004 at 8:50 PM, Ilyia Kaushansky said...: The main problem with your approach is a single point of failure. Should your computer that hosts the data "fry", and all your information is gone. In my approach, my storage vendor would guarantee 100% availability.

Also realiable Internet connection is much more important than fast connection. I think guaranteed 25K/second would be sufficient for most situations(listening to MP3s, browsing the net, editing word document), and 25K/second isn't all that much. I should clarify this as well. If GMail is using my storage vendor for email storage, it does need fast speed connection, but that is far easier to arrange than my fast connection to the storage vendor. Hope this makes sense.
At November 22, 2004 at 9:39 PM, Alex Pilchin said...: Well, I did mention the advantages of the distributed system (especially that there is no single source of failure) at the bottom of my comment and therefore implying the obvious that there is a single source of failure with what I've suggested.

However the point of my post was to point out a Realistic approach that can be implemented today, rather than a dream that demands extraordinary circumstances. In addition to pointing out the limitations for this system today.

What I've suggested can easily extend to a small LAN and therefore provide for safety against failures. And you can even put a distributed file system on top if you so desire; which I agree is ideal but unnecessary, since getting all OS vendors to support it would be an Extreeeeemely difficult task.

The main point is that it can be implemented now at the user's end or on a proprietery SAN, to whom you would entrust all your content, rather than wait for the whole network to change!

Programming Blabs

Monday, November 22, 2004

Ideal Distributed File System

3 Comments:

Contributors

Previous Posts

Previous Posts