What is FoundationDB?
FoundationDB is a distributed database. FoundationDB is designed in such a way so that it can handle large volumes of structured data across clusters. It provides ACID transactions for all operations and also organizes data as an ordered key-value store. It is particularly well-suited for read/write workloads; however, it additionally has wonderful performance for write-intensive workloads. Users can interact with the database using API language.
It started initially in 2009 as a proprietary database product and was one of the earliest to attempt adding ACID transactions to NoSQL databases.
Why FoundationDB is Necessary?
NoSQL database design involves many fundamental technical alternatives. Among these, transactions yield the most notable advantages. FoundationDB is designed in such a way so that it can perform transaction processing with higher performance at scale.
- Transaction Manifesto: Full ACID transactions are a powerful tool with noteworthy advantages. Transactions are essential to developing applications using modern distributed databases.
- Fault Tolerance: It provides fault tolerance by intelligently replicating data across a distributed cluster of machines. The architecture is intended to minimize service interruption and data loss in the event of machine failures.
- Multi-model Data Store: We can store many types of data in a single database in FoundationDB as it is a multi-model data store. All data in FoundationDB is stored safely, distributed, and replicated in a Key-Value Store component.
- Industry-Leading Performance: FoundationDB supports very heavy loads at a very low cost and hence provides high performance on commodity hardware.
- Ready for Production & Open-Source: FoundationDB has been hardened with lessons learned as it has been running in production for years, and it is open-source with a huge community.
- Scalability: FoundationDB is horizontally scalable. It can achieve high performance per node in configurations from a single machine to a large cluster. It also offers elasticity, which allows machines to automate load balancing and data distribution with a feature of provisioning or de-provisioning in a running cluster.
What is the Architecture of FoundationDB?
The FoundationDB architecture comes with a decoupled design, and multiple heterogeneous roles are assigned to processes. Coordinators, Storage Servers, Master are among these roles.
Scaling can be achieved by expanding the number of processes for separate roles horizontally:
- Coordinators: The FoundationDB cluster can be connected to all clients and servers with a cluster file containing the HOST: PORT of the coordinators. The coordinators are used to connect both the clients and servers with the cluster controller.
- Cluster Controller: It is a singleton that is elected by a majority of coordinators. It acts as the entry point for all processes in the cluster. It determines when a process has failed, tells processes which roles they should become, and passes system information between all of the processes.
- Master: The master is responsible for coordination in between the transition of the write sub-system from one to other generations. Master, proxies, resolvers, and transaction logs are all included in the write sub-system. The 3 roles are treated as a unit and fizzle. A substitution for every one of the three jobs is selected again.
- Proxies: It is used to read versions, commit transactions, and track the storage servers. A proxy will ask other all proxies to see the largest committed version to provide a read version and verify simultaneously if the transaction logs are not stopped.
- Transaction logs: It makes mutations durable to disk for fast commit latencies. It receives commits from the proxy in version order and only responds to the proxy once the data has been written and synced.
- Resolvers: It determines if there are any conflicts between transactions. A conflict can be defined when it reads a key that has been written between the transaction’s commit version and read version.
- Storage Servers: The majority of processes in a cluster are storage servers. They are assigned ranges of key, and hence also responsible for storing data for that range.
- Data Distributor: It manages storage servers and also decides which storage server will manage which data range. It ensures that the data across all storage servers is evenly distributed.
- Gatekeeper: It monitors and maintains system load and is also responsible for slowing down client transaction rate if the DB cluster is close to saturation.
- Clients: A client is responsible for linking with specific client libraries to establish communication with a FoundationDB cluster. Currently, C, Go, Python, Java, Ruby api bindings are officially supported.
Data Modelling on FoundationDB
Designer readiness is straightforwardly corresponded to how effectively and proficiently the application’s mapping and query needs can be displayed or modeled in the database. FoundationDB offers different alternatives for this issue.
Key-Value
FoundationDB offers a key-value API. Quite possibly, the main component of this API is that it preserves dictionary ordering of the keys. Be that as it may, the keys and values are always byte strings. This implies that all other application-level data types (like integers, floats, arrays, dates, timestamps, and so on) cannot be straightforwardly addressed in the API and must be displayed with explicit encoding and serialization. This can be a very difficult exercise for developers. FoundationDB attempts to ease this issue by giving a Tuple Layer that encodes tuples like (state, country) into keys so that peruses can utilize the prefix (state). FoundationDB’s key-value API isn’t implied for creating applications directly. It has crude fixings that can be blended in different manners to make the data structures that would then be utilized to code application rationale.
Document
A MongoDB 3.0-compatible Document Layer was released in the new v6.0 arrival of FoundationDB. Likewise, with some other FoundationDB Layer, the Document Layer is a stateless API that inside is based on top of a similar core of FoundationDB key-value API. The open purpose here is to deal with two of the most vexing issues by MongoDB deployments: seamless horizontal write scaling and adaptation to internal failure with zero data loss. Nonetheless, the expansion in application deployment intricacy can be critical. Each application case currently needs to run a Document Layer instance as a sidecar on a similar host. All application instances interface with a Document Layer administration through an External Load Balancer.
Relational
FoundationDB doesn’t yet offer a SQL-viable social layer. The nearest it has is the record-situated Record Layer. The objective is to assist developers with overseeing organized or structured records with specifically columns, schema changes, built-in secondary indexes, and definitive query execution. Other than the secondary index management, there are no relational data modeling builds like JOINs and foreign keys accessible. Additionally, note that records are examples of Protobuf messages that must be made/overseen expressly rather than utilizing a more significant level ORM framework familiar with relational databases.
FoundationDB Performance
With the declaration of Version Three, FoundationDB dispatched their new transactional processing engine: “This has been a massive project for us. It depends on an absolutely new scalable design. The benchmark that is shown is running 14.4 million transactions/second, so that is the order of magnitude faster as compare to the Netflix test.”
To provide high throughputs and low latencies to the application at various scales, FoundationDB uses commodity hardware.
Scaling
It scales linearly with the number of cores in a cluster. Here, a cluster of commodity hardware scales to the 8.2 million operations/sec doing a 90% read and 10% write workload with 16-byte keys and values somewhere in the range of 8 and 100 bytes.
A 24-machine EC2 c3.8xlarge cluster is used by a scaling graph in which each machine has a 16-core processor. When running FoundationDB server processes on each core, it yields a 384-process cluster for the largest test and downsizes the cluster for each more modest test.
Scaling is the ability to convey operations at various scales efficiently. For FoundationDB, the applicable operations are reads and write, estimated in operations per sec. Scale is estimated in the number of processes, which will, for the most part, track the quantity of accessible cores. FoundationDB offers scalability from fractional usage of a solitary core on a single machine to full use of dozens of powerful multi-core machines in a cluster.
Latency
FoundationDB has low latencies over an expansive range of workloads that only increase modestly as the cluster approaches immersion.
A 12-machine cluster is used by the latency graph in which each machine has a 4-core (E3-1240) processor and a single SATA SSD. When running a FoundationDB server process on each core, it yields a 48-process cluster.
Latency is the time needed to finish a given operation. Latencies are typically measured in milliseconds (ms) in FoundationDB. Like all other systems,
FoundationDB operates at low latencies while under the low burden and expanding latencies as the heap moves toward saturation. FoundationDB is designed to keep latencies low even at moderate loads. As loads approach saturation, latencies increment as requests are lined up.
For FoundationDB, the critical latencies are those experienced by a FoundationDB client who prepares and submits a transaction. Until or unless the transaction is committed, no latency is incurred by writes.
* Does not support ACID properties on multiple shards.
FoundationDB has an exceptional benefit, and that is programmed resharding. When one server is full, the DBMS itself ensures even loading of machines in the cluster by redistributing data to neighboring ones in the background. Simultaneously, the assurance of the level of Serializable for all transactions is preserved and protected, and the lone impact perceptible to clients is a slight expansion in latency of responses. The database guarantees that the amount of data on the most and least stacked cluster servers contrasts by close to 5%.
Throughput
FoundationDB gives excellent throughput to the full scope to read and writes jobs.
FoundationDB offers two storage engines, enhanced for distinct use cases, both of which write to disk before reporting transactions committed. For each storage engine, the graph shows the throughput of a single FoundationDB process running on a single core with saturating read/write jobs ranging from 100% reads to 100% writes, all with 16-byte keys and values somewhere in the range of 8 and 100 bytes.
The throughput graph utilizes a single FoundationDB server process on a single core.
Throughput is the absolute number of operations effectively finished by a system in a given timeframe. In FoundationDB, the throughput is measured in operations. In other words, some blend of reads and writes each second.
The memory engine is improved for datasets inside and outfit of memory, with auxiliary stockpiling utilized for solid composes yet not peruses. The SSD motor is streamlined for datasets that don’t fit in memory, with some peruses being served from a discretionary capacity.
Since SATA SSDs are just around 50 times slower than memory, they can be combined with memory to achieve throughputs on the similar order of magnitude as memory alone as long as cache-hit rates are reasonable. The SSD engine exploits this property. Interestingly, spinning disks are 5,000 times slower than memory and drastically degrade throughput as cache hits fall apparently beneath 100%.
FoundationDB will only reach the maximum throughputs with a profoundly concurrent workload. In fact, for a given average latency, concurrency is the primary driver of throughput.
Concurrency
The architecture of foundation DB is designed and expected to achieve top-level performance under a high concurrency rate from many users.
Its asynchronous design permits it to deal with exceptionally high concurrency, and for a typical workload with 90% reads and 10% writes, most throughput is reached at around 200 concurrent operations. This number of operations was accomplished with 20 concurrent transactions per FoundationDB process, each running 10 operations with 16-byte keys and values somewhere in the range of 8 and 100 bytes.
A ratio relates average throughput and latency referred to in queuing theory as Little’s Law for a given system. The practical application of this law states:
throughput = outstanding requests / latency
The ramifications of this relation is that, at a given latency, we can amplify throughput only by concurrently submitting enough outstanding requests. A FoundationDB cluster might have the commit latency of 2 ms and yet will be capable of far more commits each second (more than 500). A huge number of submissions each second are effectively feasible. To accomplish this rate, there must be hundreds of requests happening concurrently. Not having enough forthcoming requests is the single biggest reason for low performance.
CloudKit Use Case
CloudKit uses FoundationDB, Apple’s cloud backend service, to serve a large number of clients. Within CloudKit, a given application is tended to by a logical container, characterized by a schema that determines the record types, typed fields, and indexes expected to work with proficient record access and queries.
The application clients store records inside named zones to put together records into logical groups, which can be adjusted explicitly across client devices.
CloudKit was carried out utilizing Cassandra; Cassandra forestalled concurrency inside a zone, and multi-record atomic operations were perused to a single partition. The execution of CloudKit on FoundationDB and the Record Layer resolve the two issues. Transactions are now stretched to the complete database, permitting CloudKit zones to become remarkably bigger than before. Transactions additionally support simultaneous updates to various records inside a zone.
Conclusion
The principle thought of FoundationDB is to decouple transaction processing from logging and storage. The decoupling of logging and the determinism in transaction work on recuperation by removing redo and undo log processing from the critical path, subsequently allowing unusually quick recovery time and further improving availability. Such an unbundled architecture empowers the partition and horizontal scaling of both read and writes handling.
FoundationDB is a greatly adaptable, scalable, and a fast transactional distributed database, probably the best testing and adaptation to fault-tolerance on earth. It’s in widespread production use at Apple and a few other significant organizations. The truly intriguing part is that it gives an amazingly effective and low-level interface for whatever other framework requires scalably storing a consistent state. At last, deterministic and randomized simulation has guaranteed the rightness of the database implementation. The experience and evaluation of FoundationDB in cloud workloads show how it can meet challenging prerequisites in business.