Thoughts dump, for Jul-06-2010
Our large-scale, high-performance and highly available ( those were the goals anyway, I hope we attained them ) data store has been more or less ready for production for a few weeks now.
We have yet to actually deploy it, although two forthcoming (in-development) projects will be built on top of it. There are a few things here and there we could, and will, change, cleanup the client library API and all that, but as far as I can tell, there are no real issues left to resolve. During testing, we got unto 40K GET(value by key) and over 50K PUT(value by key) operations/second on a 3 nodes system (quorum arrangement). Adding nodes increased capacity and throughput which was one of the design goals.
We got a few more similar projects in the pipeline; more building blocks for our services stack. We are going to build two different file system (one will be optimized for very high performance access to files, another for availability and storage of files not limited in size), a MapReduce framework/infrastructure and a new distributed lock manager which will also replace ad-hoc solutions we currently rely on.
I am very proud of our team; they are smart and inventive, passionate and hard working. They let me toy around with ideas that do not always make sense and they always find ways to make me feel great about what we do. Good times ahead.
Tuesday, 6 July 2010 8:33 pm
Update on CloudDS
Here is a progress update to my current main project (we call it 'CloudDS' which stands for cloud data store which is a silly name but it will have to do until we can find a replacement ).
I have been working on the data store component of the service. It has taken at least x4 as much time and effort as I thought it would. A prime reason for underestimating the time requirements is that the initial features list I wanted to implement doubled in size. In addition to that, testing for most of the possible logic paths that could result to failure also took a long time - even if some of that testing was automated, not all of it was and validating results is harder than setting up the test environment.
In such a service, it matters little if most of underlying components fail (I/O and tasks scheduler, garbage collector, cache subsystem, etc) as long as the data management component is not affected. Suffering from a service outage is bad, suffering data corruption and/or data loss is something that has to be prevented by any means necessary.
As it stands, that said component now deals fine with reads and writes, self-healing, caching and performs faster than I hoped it would. The data model is based on BigTable, Dynamo, Cassandra and some earlier prototypes/projects we toyed with in the past. It borrows Cassandra's ColumnFamily/SuperColumn/Column key value representation model. Data are pushed into MemTables and an append only commit log, memtables are flushed into SSTables to disk.
The GCollector merges SSTables whenever required to reclaim space, resolve conflicts and extract a single value out of multiple versions, etc. All operations supported by Cassandra are implemented (query by path, predicate, column names, key ranges, etc ) and CloudDS clients/users will also be able to use a scripting language to describe explicitly down to bytes what they need(i.e give me the first couple of bytes for those values, or gimme a concatenation of those values, etc etc).
Now that that component is out of the way, I can move on to the rest; those are relatively straight forward to implement ( the tasks scheduler and the network I/O subsystems are mostly done ).
Friday, 26 March 2010 9:19 pm
On Javascript and simplicity
If you still have doubts about Javascript: becomingbeing the most popular programming languge, its probably because you are not exposed enough to the web-based Applications paradigm shift efffects.
Not only there seems to be more javascript code (in terms of sheer volume) out there, its also about the number of users using applications that are driven by it, most of them not really knowing, or wanting to know, what it is, but thats an entirely different story for someone to tell, again.
We are relying on 4 primary programming languages. C/C++ for backend 'stuff', PHP, Javascript and SGL for frontend/light-weight 'stuff'. Well, we do use bash scripting for _so_ much systems and operations 'stuff', some python and perl here and there, as well as some java and Flash/AS3 for more frontend 'stuff'.
SGL is our home-grown programming language, it stands for Switch Glue Language, Switch being the main framework/library everything - all services, tools, other libraries, etc - are based on. The idea is that we can use this language anywhere we want to script operations and 'glue' things(services, resources, operations, etc) together. Currently, its used for two major services.
Our frontend developers eventually have to learn, or at least get familiar with, all those three main frontend languages, PHP, Javascript, SGL. Interestingly enough, Javascript code output surpassed PHP output, in terms of volume, mostly because our apps got more functional, fancy, whatever cool bang you get from client-side logic on the browser -- I wouldn't know really, I don't know much about frontend development, our main frontend team do though and that's all that matters (partial unordered list: phaistonian, hatdi, sug, stelabouras).
Given that SGL has been long due for a rewrite ( the currenty language syntax and semantics ), I thought I put aside some time to rwrite it, this time around using Javascript language syntax and semantics so that, when its ready, we could replace PHP with SGL thus, effectively, switching from 3 frontend languages to just one. Our developers, current and future ones, would only need to learn a single language, which may be the greatest benefit to this shift, but it sure is not the only one.
This will be my third attempt to writing a programming language ( SGL being the second, PASTE was the first.. those were the days) and thanks to Javascript being a standard, its a 'simple' enough matter of writing an efficient enough VM that will run the emitted bytecode ( I am toying with the idea of being able to target PHP and other languages, eventually, generating - say, PHP code from SGL code and so on ).
So far the lexer, most of the parser and some parts of the VM are in place. Hopefully, there will be enough time and sustained motivation to keep this going (its a side project, so it can't really preempt current major Phaistos projects ) until its ready, perhaps by the end of the month.
Simplicity is the ultimate sophistication- Leonardo da Vinci
Saturday, 1 August 2009 9:26 pm