Introduction to the DataStore
Using Google App Engine is really easy for developers, you can use the language you prefer (Java, Python and a bunch of dynamic languages running on top of the JVM) with the usual API.
An old Java addicted can also use the usual javax.servlet, javax.mail, javax.cache, javax.persistence and so on and everything works great, but.....
....but after a while you feel that something is "wrong". Yes, you are using JPA but your "relations" doesn't work as expected and even if you read something about the fact that the underline store is not SQL you don't realize immediately that SQL is only the top of the iceberg. Because the DataStore is even not a Relational Database. After being shocked for a while I decided to study how this piece of software works (not much infos in my opinion on official GAE site), because to work well with it you must think the way it was designed for.
Let's start with a formal definition.
DataStore defined and main features [1]:
"Google App Engine Datastore is a schema-less persistence system, whose fundamental persistence unit is called Entity, composed by an immutable Key and a collection of mutable properties.
Entities can be created, updated, deleted, loaded by key and queried for properties values.
DataStore is consistent and transactional, with support to current transaction. "
From this (quite) formal definition we can understand that is a modern DBMS, but under the hood is very different from Oracle or MySQL. Reading on the net I understood that the building blocks [2][3][6] are a bunch of Google Technologies that I tried to schematically represent this way:
Looking at this picture we must consider that:
Transactions and Index are based on MegaTable
File persistence it's done with Google File System
It's distributed thanks to Chubby
But the guest star is one of the main technology that run at Google:
BigTable: definition [4]
"A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map"
or
Is an associative array that is:
sparse because only "not null" values are persisted
distributed in Google cloud
persistent on Google File System
multidimensional in columns values
ordered lexicographically by key
coming back on the Earth we can say that BigTable is not only an HashTable like storage, but that thank to the way is designed and implemented we have the concept of "logical table" and "logical row" in it. Before introducing these two concepts let's take a look at the physical organization of the associative array:
The key is a triad:
row: String
column: String
timestamp: long
To every key we associate a value that is an array of bytes
From here it's only a small step to the BigTable's ROW logical view (example from the official BigTable doc):
Where
the row is "com.cnn.www" (it's like the primary key, talking in a relational way)
we have three columns: "contents:", "anchor:cnnsi.com" and "anchor:my.look.ca" (later I'll explain why I used the colon)
"contents" references three content versions (at time t3, t4 and t6), which is the HTML of the page
The two "anchor" columns instead have only one version containing the link description text.
The logical row in BigTable has some important features:
Row key it's an arbitraty string, up to 64kb, that usually is few bytes with some meaning in the domain problem
Read & Writes of a single row are atomic, regardless to the number of modified properties
Row key is also used to lexicographically order data and to partition them between the servers in the cloud
Those rows are stored inside the Google cloud in tablets. Every single tablet contains a range of data lexicographically ordered by row and is the minimal unit of distribution and load-balancing (about 200MB). It's really important to store rows with keys that minimize the number of tablet accessed, to maximize efficiency, hence performance.
Another interesting abstraction in BigTable is the one we can do with "column families", which is a group of column which represent the same concept, for instance "anchor:com.yahoo.www" and "anchor:com.google.www" are two columns that represent the concept of family. A column is always represented with the syntax "family:qualifier" and they are not only used to represent similar concept, they are the minimal unit of Access Control. For every "column family" in a table we can decide who can read, write and/or update.
The last concept that we must understand to work with BigTable is the "timestamp", a 64bit integer that remember the long field we use for optimistic locking in many ORMs. Here it's used to have many versions of the same row into the table, and for this reason we have to choose some policy for creation of a new version and garbage collection of old versions. We can decide to keep version newer than time T, or may the last n versions, this is a choice of the developer that it's using the table (unfortunately versioning is a BigTable's feature that we can't use in DataStore).
DataStore's additions on BigTable
So we have an idea of what BigTable is, but on DataStore we have more. In fact on top on it DataStore adds:
Query on properties (property is what in BigTable is a column family)
Query on multiple properties
Transactions
Query on properties
Let's use a simple example for a better understanding. Suppose we have an Entity defined this way (a very simple JPA view):
@Entity
public class MyEntity {
@Id
public Key key;
public long property1;
public long property2;
public long property3;
}
and we want to execute this JPA-QL query:
SELECT p
FROM MyEntity p
WHERE p.property1 = 2
Thinking the way BigTable stores data we can image such a query like a resource intensive task. But is not, the DataStore comes to our help and organize data so that we can query them easily thanks to indexes. A DataStore index is a table with "well organized" keys and no values, lexycographically ordered in a way that make access to table data very fast. An example tailored for the previous query is this:
MyEntity:property1/2:key:key1
MyEntity:property1/4:key:key2
MyEntity:property1/2:key:key3
<novalue>
<novalue>
<novalue>
where the row with key1 and key3 are selected.
We must also be aware that using indexes is not an holy grail, being fast is not a matter of having the index, but a right balance between them and the time we spend updating them at every insert. In fact, while the index is not size sensitive, the number of indexes impact very much the insert time of rows in a table, because every index must be updated.
Multiproperties queries
Suppose now we want to query on more properties:
SELECT p
FROM MyEntity p
WHERE p.property1 = 1
AND p.property2 = 2
AND p.property3 = 3
this is done walking every property's index and intersecting the result, starting with the first index (selected row in yellow):
MyEntity:property1/1:key:key1
MyEntity:property1/2:key:key2
MyEntity:property1/3:key:key3
<novalue>
<novalue>
<novalue>
the second index on property2:
MyEntity:property2/2:key:key1
MyEntity:property2/3:key:key2
MyEntity:property2/4:key:key3
<novalue>
<novalue>
<novalue>
and finally the third index on property3
MyEntity:property3/3:key:key1
MyEntity:property3/4:key:key2
MyEntity:property3/5:key:key3
<novalue>
<novalue>
<novalue>
Here is clear that "key1" is selected because it follow the path from the first index to the last.
Entities Groups and Transactions
This is another key concept of the DataStore, and we must understand it well to work productively, with native API, JDO/JPA or whatever API you use to persist your data. It is an abstraction and not something that change the way data are stored but only the way that keys are generated. Suppose we have a simple hierarchy of tables that represent the Italian Football Championship (Serie A) teams, players and scores:
This entity group is represented on the DataStore with a correct choice of entities keys:
Here the ROOT Entity (Fiorentina) is the key for the transaction to work. All entities in this entity group refer to its version timestamp, which is like a flag about a well done transaction. If all is gone well then the commit updates this timestamp to flag that the entities in this group are all ok. If something went wrong nothing is done on the root entity and dirty entities (those without a valid ROOT entity) will be later garbaged.
Thanks to this abstraction we can write code like the following, that is very common in JPA programming and really easy to do on a relational database:
EntityManager em = EMF.getEntityManagerFactory().createEntityManager();
EntityTransaction tx = em.getTransaction();
tx.begin();
Team fiore = new Team();
fiore.setDescription("Fiorentina");
...
...
em.persist(fiore);
Player gila = new Player();
gila.setName("Alberto Gilardino");
...
...
fiore.addPlayer(gila);
Score goal = new Score();
goal.setMatch("Liverpool - Fiorentina");
gial.addPlayer(goal);
tx.commit();
em.close();
We can note that we only need to persist the ROOT entity and that we have utility methods (addXXX) that manage the inverse mapping necessary for a real ownership of an object. In case something goes wrong before the commit, all will be rolled back.
Conclusions: relational is inside us
We are managed to think "relational" because the winning technology in the last 35 years is the Relational Database Management System. Often we think that everything can be well designed with the relational model, but this may be not true, just think the effort we need to do every time we map our Java objects, also with modern ORMs.
This is not only a technological challenge because this time we have to work on out "forma mentis" in order to fully use the new Cloud-Computing tools that Google and other vendors are going to provide us.
References
http://code.google.com/intl/it-IT/appengine/docs/java/datastore/overview.html
http://perspectives.mvdirona.com/2008/07/10/GoogleMegastore.aspx
http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
http://oakleafblog.blogspot.com/2008/04/comparing-google-app-engine-amazon.html
http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html