The essence of an embedded database
Posted by Steven Graves, Mar 12 2010, 06:12 AM
The term 'embedded database' has been around since the mid-1980's. It was originally created to mean a database system that is embedded within application code. In other words, the database management system is delivered as a library that you, the developer, link with your application code (and other libraries) to create an executable. In that sense, the database system functionality is 'embedded' within your application code. Hence the name "embedded database."
Since the late 1990's, embedded database system vendors have been trying to sell their technology to developers of embedded systems. This has created a lot of (unfortunate) confusion. In the 10 or so years since, some folks have come to equate "embedded database" with "embedded systems", which has led them down a path to frustration and, in some cases, project failure.
Why? Because the vast majority of embedded databases were not written with the unique characteristics (slower CPUs, limited memory, no persistent storage, etc.) of embedded systems in mind. In fact, many embedded database systems were created in the 1980's, long before anyone considered using an embedded database in such systems (remember that most embedded systems in that era were 8- and 16-bit systems that simply couldn't address enough memory to permit use of a COTS embedded database system).
Unfortunately, some embedded database vendors haven't helped the situation. They have adjusted to changing market conditions by re-casting their embedded database products as a solution to the data management needs of embedded systems, even though their technology was not written – and, in fact, is not suited – for embedded systems. These changing market conditions include the rise of open source/dual-license products like MySQL and BerkeleyDB that became dominant players in the line-of-business client/server DBMS market and embedded database respectively, and the emergence of free entry-level RDBMS offerings from Oracle and Microsoft (SQL Server Express edition and Oracle 10g Express Edition, respectively). Faced with these challenges, vendors of proprietary, closed source, and commercial (not free) embedded database products found it increasingly difficult to compete, and sought “green fields” in the embedded systems software market for their products.
As an aside, the media recognized the situation in the early part of the last decade and, SD Times in particular, tried to popularize a new term, "application-specific database." Unfortunately, the effort didn't stick and we are still left with the term 'embedded database'.
So, back to the subject of this blog post. What is the essential attribute of an embedded database system? It is exactly what I described in the opening paragraph: The database system functionality is linked with application code and resides in the same address space. This contrasts to client/server architecture DBMS in which the database server exists as a standalone executable, accessed by client programs through an inter-process communication (IPC) and/or remote-procedure-call (RPC) mechanism.
In short, an embedded database system should exist wholly within your application's address space and not require communication with any external agent. Anything external is an immediate tip-off that the DBMS is not, in fact, wholly embedded.
As a former colleague of mine, a VP of marketing, once said to me: "What is the 'so, what' of it?" Excellent question. Why should anyone give a hoot?
Perhaps in the non-embedded systems market of embedded databases, nobody does (though even that is arguable). But in embedded and real-time systems, the "so, what?" is performance. The need to communicate with an external program, for any purpose, imposes a performance hit that few real-time/embedded systems can afford. This is true regardless of whether that external program is a lock manager, lock arbiter, dead-lock detector, or anything else.
Another "so, what?" is the introduction of dependencies on external components, notably a communication protocol like TCP/IP. Communication between the application (with the database system embedded within it) and an external component also necessarily increases the complexity, fragility, and, consequently, the potential need for administration. These dependencies might not be a big deal in line-of-business systems running on PCs and other systems running robust operating systems like Windows, Linux and Solaris and in organizations with an IT staff. But for an unattended embedded system running on a relatively modest CPU, with a simple RTOS and limited network connectivity/bandwidth, it can be a killer.
Since I am writing this blog post, it should be no surprise that eXtremeDB is an embedded database in the true sense. eXtremeDB never requires communication with an external component. We do offer remote interfaces to eXtremeDB databases through both our native and SQL APIs, and the High Availability edition requires a communication channel for synchronizing master and replica databases, and replicating transactions. But these are optional.
If you have demanding performance requirements, limited resources, and/or are developing an embedded system that absolutely, positively must run un-attended (i.e. "zero administration") then carefully consider your choice of embedded database system.
Commercial-Off-The-Shelf (COTS) or Roll-Your-Own (RYO)
Posted by Steven Graves, Dec 17 2009, 01:44 PM
Some of us here at McObject just read this article on embedded.com titled Making the case for commercial communication integration middleware.
A lot of the suppositions in that article with respect to the case for COTS operating systems and communications middleware also ring true for database systems. From the beginning of McObject, we've recognized that RYO represents our largest "competitor".
One of the arguments put forward by Dr. Krasner is that "RYO solutions tend to be designed and implemented based on initial connectivity requirements and thus are very brittle when new requirements are introduced." The same can be said of RYO data management solutions. You have a specific need and you write a solution for that specific need. When changes come later, the original solution needs some retrofitting, which can range from minor to a major overhaul.
Dr Krasner goes on to say that "These [RYO] designs are tightly coupled...", which is true. In contrast, COTS solutions are designed from the start to solve a wide variety of database management problems, and are not designed to solve one specific problem. Consequently, they are loosely coupled.
After listing some limitations of RYO, Dr. Krasner says "Over time, as the above issues are addressed, RYO middleware ... often ends up becoming a full-blown infrastructure..." I can't tell you how many times I've seen that outcome in the context of RYO data management solutions, too. Inevitably, the organization that embraces RYO finds themselves committing ever greater resources to a custom, in-house, database management system until one day somebody realizes it and initiates a search for COTS replacement so that their staff can return their attention to their own core competence.
The balance of the article makes the case for the return-on-investment (ROI) of COTS by putting some numbers to the metrics for successful projects involving communications middleware. I'd love to see a similar study for COTS vs RYO database systems. I'd bet dollars to donuts that the case would be as, or more, compelling for COTS database systems.
McObject’s customers are more likely to express the economic benefit from using our eXtremeDB embedded database in terms of developer-weeks or developer-months saved. For example, Boeing credited eXtremeDB with saving 18 developer-months in an upgrade to the embedded software in its Apache Longbow helicopter. Another customer, IP Trade, provides a communications system for securities traders. IP Trade’s head of development pointed to 6 programmer-months saved by using eXtremeDB rather than building data management code from scratch. Presumably those calculations include both development and QA – but ease of updates, code maintenance and other downstream benefits of using a COTS database will add to the savings.
Database Devices in eXtremeDB 4.0
Posted by Steven Graves, Dec 2 2009, 11:17 AM
eXtremeDB 4.0, which was released last month, includes a new, cleaner, interface for creating/opening databases, especially when a database is a hybrid (in-memory and on-disk) database.
Prior to version 4.0, two functions were needed to open a hybrid eXtremeDB database: mco_db_open() and mco_disk_open(). To provide a consistent interface irrespective of whether a database is entirely in memory, entirely on disk, or a hybrid, we introduced the concept of database 'devices' to eXtremeDB. In 4.0, the approach is to define an array of structures that describe the devices a database needs, and to pass that array in as an argument to the new mco_db_open_dev() interface. For backward compatibility, in-memory database can still be opened with the legacy mco_db_open() API.
A 'device' can be a conventional memory segment, a shared memory segment, a simple file path (for the database and/or log file), a multi-file path, or a RAID-type device.
A multi-file path can be considered a virtual file consisting of multiple segments. When the first segment is full, we start filling the second one, and so on. For file systems with 2GB size limits, a multi-file device allows for on-disk databases >2GB.
RAID-style devices offer two additional capabilities: in a RAID-0 configuration (striping), database pages are scattered between RAID segments. It is assumed that each RAID segment resides on a physically separate device so that writing to two segments in separate devices can proceed in parallel. Obviously, this also requires hardware support (i.e. that there are separate controllers and I/O channels such that the read/write operations are not serialized through a single controller/channel).
A RAID-1 configuration (mirroring) is also supported. In this configuration, the database pages are written simultaneously to each device. This can improve reliability by avoiding the need to perform a restore from a previous backup in case of a disk crash, and potential loss of data if the roll-forward transaction log happened to be on the same device as the database file(s).
In summary, devices in eXtremeDB 4.0 provide for a more elegant programming interface and facilitated our ability to extend eXtremeDB functionality to support on-disk databases >2GB even if the underlying file system has a 2GB file size limit, and to support striping and mirroring of databases, which in addition to the 4.0 MVCC transaction manager has the potential to further exploit multi-core and parallel programming.
eXtremeDB 4.0 released
Posted by Steven Graves, Nov 11 2009, 02:27 PM
On Monday, we announced the release of eXtremeDB 4.0. Evaluation versions of the eXtremeDB Standard Edition (in-memory) and eXtremeDB Fusion (hybrid in-memory and on-disk) database system, with and without eXtremeSQL, for 32-bit Windows and Linux are available for download now.
Two central themes summarize the 4.0 release: Leveraging multi-core and expanding the already-expansive choices of APIs, index types, and more. eXtremeDB 4.0 includes new choices for the transaction manager, a new index type, a new programming interface choice, and other improvements to maximize multi-core and make it easier to work with hybrid databases.
I'll discuss the new transaction manager today, and other major features of 4.0 in the days to come.
eXtremeDB 4.0 introduces a new transaction manager to the product: the MVCC transaction manager. MVCC is an acronym for Multi-Version Concurrency Control. This is a concurrency control technique often found in our big cousins (Oracle, et al) but not in database systems for embedded systems. Until now, that is. Previous versions of eXtremeDB employed a Multiple Reader Single Writer transaction manager (we call it MURSIW and pronounce it "mer siv"). This transaction manger was, and is, fantastic for an in-memory database system for embedded systems with relatively few concurrent tasks/threads. In such a setting, the cost of complex lock arbitration is unjustifiable. Over the years, though, eXtremeDB has expanded beyond our initial target market, and has acquired new functionality (like the hybrid capability where some or all of the database is stored on persistent media). The MURSIW transaction manager doesn't always fit in these environments; there may be more concurrent threads updating the database than MURSIW can efficiently handle, or the storage media is too slow.
But, we didn't want to change the programming paradigm of eXtremeDB by introducing a lock arbiter and pessimistic locking APIs. And, given the accelerating adoption rate of multi-core systems in embedded systems, we also didn't want to implement a concurrency model that would create barriers to maximum utilization of multiple cores. So MVCC was a natural choice.
MVCC is an optimistic concurrency model. No task or thread is ever blocked by another because each is given its own copy of objects in the database to work with during a transaction. When a transaction is committed, its copy of the objects it modified are put back to the database. So no explicit locks are ever required during a transaction, and therefore there is no lock arbiter. Locks are implicity applied by the eXtremeDB run-time when the transaction commits. It is possible that two tasks will try to modify the same object at the same point in time. In this case, one task will receive an error code, MCO_E_CONFLICT. So application logic needs to account for this possibility and be prepared to re-try the transaction. Apart from this, your eXtremeDB application code doesn't change - you still wrap your database access code in between mco_trans_start() and mco_trans_commit() calls.
As I alluded to above, the MURSIW transaction manager is still part of eXtremeDB, so you can choose between MURSIW and MVCC. They are delivered as separate libraries, so you make the choice at compile-time. Which leads to the obvious questions: How do I choose, and what are the tradeoffs?
The characteristics that I described above that were the original design goals for MURSIW still make MURSIW the better choice when those characteristics are present: an in-memory database with a small number of concurrent tasks modifying the database. There is very little overhead with the MURSIW transaction manager. Read-only tasks can operate in parallel since they don't modify the database and therefore cannot interfere with each other. Read/write tasks will have exclusive use of the database for the duration of their transactions, but an eXtremeDB in-memory database is so fast that the transaction often completes faster than it would have been possible to perform a context switch to a lock arbiter, much less actually arbitrate access requests.
If there are more than a few concurrent tasks that need to modify the database, or you cannot tolerate having read-only requests blocked by a task that is modifying the database, or the storage media is anything other than RAM (i.e. a hybrid database on relatively slow HDD or SSD media) such that transactions run too long for MURSIW to make sense, then MVCC might be the better choice. MVCC carries more overhead because it has to create the copies of the objects for each task (read-only or read-write), track them, write them back in the case of a read-write transaction, and eventually discard unused objects. So if you compare MURSIW and MVCC with a single thread, for example, MURSIW will win every time because of the lower overhead. Likewise, if you compare MURSIW and MVCC when the access is largely read-only, for any number of concurrent threads, MURSIW will win. However, for concurrent write transactions, the additional overhead is quickly overcome on multi-core systems. Here are some pictures to illustrate the point.
The tests were executed on a quad-core system running Windows Vista. Each thread executed 1 million inserts and hash searches. The graphs show the time in milliseconds for the threads to complete the tasks. Smaller numbers are better. As you can see, with two and four concurrent threads writing to the database, the MVCC transaction manger overcomes the additional overhead, and with four threads attained a total throughput of over 1,600,000 inserts per second, and 4,000,000 searches on a hash index. This is a simple test for which the sole purpose is to illustrate the differences between MURSIW and MVCC.
Another characteristic of the MVCC transaction manager is a higher memory requirement because it creates copies of objects for each concurrent task. We don't see this as much of an issue, however. eXtremeDB has always been exceptionally frugal with memory, so relative to alternatives, we had room to work with. And it is expected that the systems for which MVCC will be the logical choice are not typically resource-constrained.
Have questions? Leave me a comment (you need to be registered).
SSD Performance
Posted by Steven Graves, Nov 1 2009, 04:17 PM
Recently, I tweeted (see http://twitter.com/McGuy) a comment on the claim in this article http://www.javaworld.com/community/?q=node/3567 that "SSD performance is limited only by the SATA2 interface throughput."
Somebody replied that 250MB per second sustained read speed = 2GB/s. Add on the bus overhead and you have 3gb's (the SATA2 speed limit).
This page http://www.anandtech.com/storage/showdoc.a...i=3531&p=24 has test results that bear out the 250MB per second claim, but only just, and only in one specific test case - a sequential read of 2MB files.
Every other test conducted shows performance well south of 250MB/s. Sequential writes were 2nd best at 195 MB/s. The much more realistic randon reads and random writes were 56.5 and 31.7 MB/s, respectively.
Further, these numbers were only attained by the cream of the crop. The much more 'pedestrian' results were under 30MB/s read and under 3MB/s write!!!
There's also no mention in the article of whether the tester conducted the tests with "fresh out of the box" SSD, or they preconditioned the drives before conducting the test. This can have a huge impact on the real-life performance. See http://www.flashmemorysummit.com/English/C...torial_Amer.pdf for example.
So, I stand by my comment. SSD, in real life, are no where near pushing the SATA2 speed limit and comments to the contrary are just marketing hyperbole.








