tag:blogger.com,1999:blog-15589472120530864372024-03-13T01:28:07.572-07:00sanjoykr.blogspot.comLearn and ShareAnonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.comBlogger34125tag:blogger.com,1999:blog-1558947212053086437.post-12606825370656666642017-08-28T03:07:00.000-07:002017-08-28T03:07:56.823-07:00Paper Summary - Data Ingestion for the Connected World<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="font-size: 120%;">
<center>
<b>
Data Ingestion for the Connected World
</b>
</center>
</div>
<div>
<center>
<i>John Meehan, Cansu Aslantas, Stan Zdonik, Nesime Tatbul, Jiang Du</i>
</center>
</div>
<br /><br/>
<div>
Businesses have been using “Big Data” applications to perform timely analytics to make real/near-real time decisions. Effectiveness of these analytics and decisions depends on how quickly necessary data can be extracted, transformed, and loaded from operational platform to analytical platform while ensuring correctness. According to the authors, it is challenging for these latency sensitive “Big Data” applications to do this via traditional ETL processes which are cumbersome and very slow. They propose a new architecture for ETL which they call streaming ETL. Streaming ETL can take the advantages of the push-based nature of a stream processing system.
</div><br/>
<div>
In this paper, authors have proposed streaming ETL requirements. Streaming ETL must ensure the correctness and predictability of its results. At the same time, a streaming ETL system must be able to scale with the number of incoming data sources and process data in as timely as possible. They have divided the requirements into three categories:
</div><br/>
<div>
<ul style="list-style-type:square">
<li>ETL requirements</li>
<li>Streaming requirements</li>
<li>Infrastructure requirements</li>
</ul>
</div><br/>
<div>
<i>ETL Requirements (Data Collection + Bulk Loading + Heterogeneous Data Types)</i>
</div><br/>
<div>
In the case of streaming data sources, data must be collected, queued, and routed to the appropriate processing channels. A data collection mechanism should have the ability to transform traditional ETL data sources into streaming ETL sources. Data collection should scale with the number of data sources. A streaming ETL engine must have the ability to bulk load freshly transformed data into the data warehouse. Streaming ETL engine should have data routing capability to load semantically related data into multiple target systems.
</div><br/>
<div>
<i>Streaming Requirements (Out-of-Order and Missing Tuples + Dataflow Ordering + Exactly-Once Processing)</i>
</div><br/>
<div>
When number of data sources and/or data volume is huge, there is a possibility that data may get out of time-stamp order and sometimes data can be missing altogether. Waiting for the things to be sorted out can introduce an unacceptable latency. Authors have proposed to use timeout value and predictive techniques (e.g. regression) to overcome these issues. To improve the performance, streaming ETL should break large batches into smaller ones and large operation also needs to be broken into a number of smaller operations. Streaming ETL must use ordering constraints to ensure that these smaller operations on smaller batches still produce the same result as their larger counter parts. Also, any data migration to and from the streaming ETL engine must occur once and only once.
</div><br/>
<div>
<i>Infrastructure Requirements (Local Storage + ACID Transactions + Scalability + Data Freshness and Latency)</i>
</div><br/>
<div>
Any ETL or data ingestion pipeline needs to maintain local storage for temporary staging of new batches of data while they are being prepared for loading into the backend data warehouse. Streaming ETL is no different. Having local storage will also help to ensure the correctness of temporal ordering and alignment of the data. Since streaming ETL engine will be processing multiple stream at once, and each dataflow instance may try to make modifications to the same state simultaneously, it is expected that streaming ETL must follow ACID transaction semantics. Streaming ETL must also ensure that scalability of data ingestion and data freshness.
</div><br/>
<div>
<i><b>Streaming ETL Architecture</b></i>
</div><br/>
<div>
Authors propose a new architecture based on the above requirements.This new architecture has four primary components:
</div><br/>
<div>
<b>Data collection:</b> This component has a collection of data collectors. These data collectors primarily serve as messaging queues. Data collectors consume data from different sources, create logical batches of data and push them to the streaming ETL engine.
</div><br/>
<div>
<b>Streaming ETL:</b> This component contains a range of ETL tools, including data cleaning and transformation operators. Dataflow graph can be created using these operators to massaged the incoming batches of data into normalised data. Once the data has been fully cleaned and transformed, it can be either pushed into data warehouse or pulled by data warehouse.
</div><br/>
<div>
<b>OLAP backend:</b> This component consists of a query processor and one or several OLAP engines. Each OLAP engine contains its own data warehouse, as well as a delta data warehouse. Both data warehouses have same schema. Streaming ETL engine writes all updates to the delta data warehouse, and OLAP engine periodically merges these updates into the full data warehouse.
</div><br/>
<div>
<b>Data migrator:</b> Data migrator ensures that no batch of data get lost when it moves from streaming ETL to OLAP backend components. This should also fully support ACID transactions.
</div><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6m1EiMxqiUIyRYe3TX4MDhxDhZES_2xEQyALAWfH2OahpwEpTDM0esFhC3fKfxPeYCJw5yyR4UFUNroHc-xFuxMB9aBu_UZm_W9UUg8YAm8fbn0n2FmAJdKfJt-0uxUUW5vSEClABiqA/s1600/Screen+Shot+2017-08-28+at+11.03.16.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6m1EiMxqiUIyRYe3TX4MDhxDhZES_2xEQyALAWfH2OahpwEpTDM0esFhC3fKfxPeYCJw5yyR4UFUNroHc-xFuxMB9aBu_UZm_W9UUg8YAm8fbn0n2FmAJdKfJt-0uxUUW5vSEClABiqA/s320/Screen+Shot+2017-08-28+at+11.03.16.png" width="320" height="190" data-original-width="440" data-original-height="261" /></a></div><br/>
<div>
Authors have built a proof-of-concept implementation based on this new architecture using Apache Kafka, S-Store, Intel’s BigDAWG polystore, and Postgres.
</div><br/>
<div>
In this paper, authors have also tried to answer another important question regarding the frequency of the data migration to the data warehouse by a streaming ETL system. There are two methods: push (ingestion engine periodically pushes the data to the warehouse) and pull (warehouse pulls the data from the ingestion engine when it is needed). Authors have run an experiment to test the pros and cons of each method and according to them pulling new data with each query is the best option if the data staleness is the priority. They also suggested that it is better to go for smaller, more frequent migrations in both push and pull scenarios.
</div><br/>
<div>
<b>Conclusion</b>
</div><br/>
<div>
Authors think that streaming ETL can be extended to create all-in-one ingestion and analytics engine specifically for time-series data which they call Metronome (time-series ETL). This paper focuses on the functional requirements of streaming ETL. Authors also build a proof-of-concept implementation based on these requirements.
</div><br/>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-70589472604426690832017-08-06T10:33:00.000-07:002017-08-06T10:33:08.047-07:00Visualising Software Architecture Effectively in Service Description<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
Somedays back one of my team members told me about <b>Simon Brown's C4 model</b>. Since then I have been following this to document the software architecture. This presentation is about the diagrams that I draw (or I like to see) in service description based on C4 model.<br/><br/>
</div>
<div>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/qQv4aFy3kZ5QWF" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/sanroy/visualizing-software-architecture-effectively-in-service-description" title="Visualizing Software Architecture Effectively in Service Description" target="_blank">Visualizing Software Architecture Effectively in Service Description</a> </strong> from <strong><a target="_blank" href="https://www.slideshare.net/sanroy">Sanjoy Kumar Roy</a></strong> </div>
</div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-16059417515202601182017-08-04T13:16:00.000-07:002017-08-04T13:16:14.327-07:00Paper Summary - Prioritizing Attention in Fast Data: Principles and Promise<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<center>
<a href="http://www.bailis.org/papers/fastdata-cidr2017.pdf"><b>Prioritizing Attention in Fast Data: Principles and Promise</b></a><br/>
<i>
Peter Bailis, Edward Gan, Kexin Rong, Sahaana Suri<br/>
Stanford InfoLab<br/><br/><br/>
</i>
</center>
</div>
<div>
Processing and interpreting huge volume data that is in motion (fast data) to get timely answer is challenging and sometimes infeasible due to the scarce of resources (both human and computational). Human attention is limited. According to the authors, a new generation analytic system is needed to bridge the gap between limited human attention and growing volume of data. This new type of analytic system will prioritise attention in fast data. In this paper, authors have proposed three design principles that can be used to design and develop such fast data analytic system:<br/><br/>
</div>
<div>
<i><b>Principle 1: Prioritise Output – The design must deliver more information using less output.</b></i><br/><br/>
</div>
<div>
Fast data analytic system should produce fewer and good quality output. If a system produces lot of raw (output) data, then it becomes difficult for a human to give attention. For example, if the end result is to find out which device is producing more problematic records, then it would be ideal if the system can simply return the device id with the count of records rather than producing every raw problematic record. According to the authors – <i>“A few general results are better than many specific results”.</i><br/><br/>
</div>
<div>
<i><b>Principle 2: Prioritise Iteration – The design should allow iterative feedback-driven development.</b></i><br/><br/>
</div>
<div>
Modern analytics workflows consist of many steps – including feature engineering, model selection, parameter tuning, and performance engineering. It is difficult to get the final model at first attempt. This means that analytics system should empower the end users by giving them necessary tools so that they can improve these steps iteratively based on the feedback. Today this is very labour intensive and time-consuming task. Fast data analytics system should lower this barrier. Fast data system should be designed for modularity and incremental extensibility.<br/><br/>
</div>
<div>
<i><b>Principle 3: Prioritise Computation – The design must prioritise computation on inputs that most affect its output.</b></i><br/><br/>
</div>
<div>
One of the key property of fast data is – not all inputs contribute equally to the output. Therefore, it is waste of valuable computational resource if the system gives equal importance to all inputs. But how will fast data system select these inputs that contribute most to the output? According to authors – <i>“fast data systems should start from the output and work backwards to the input, doing as little work as needed on each piece of data, prioritizing computation over data that matters most”.</i><br/><br/>
</div>
<div>
<b>MacroBase</b><br/><br/>
</div>
<div>
Authors have built a new fast data analysis engine called MacroBase based on the principles outlined above. At present MacroBase’s core dataflow pipelines contain a sequence of data ingestion, feature extraction, classification, and explanation operators. These operators perform tasks including feature extraction, supervised and unsupervised classification, explanation and summarisation. MacroBase can process data as it arrives. It can also process data in batch mode.<br/><br/>
</div>
<div>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1maojsL79n-rwtc6ZnR0OxHCXZLUJ9t1yYbiB_mRCRp10cGakLuybYmbK6rKjxYhVrLv-1zL7ZKsWVu4w4mI2BOWOqoT9PSxK-L_jbq4gtkuIARhkJTlW6kPw2tZFIB8Ys6gb23Tg4pM/s1600/Screen+Shot+2017-08-04+at+14.31.29.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1maojsL79n-rwtc6ZnR0OxHCXZLUJ9t1yYbiB_mRCRp10cGakLuybYmbK6rKjxYhVrLv-1zL7ZKsWVu4w4mI2BOWOqoT9PSxK-L_jbq4gtkuIARhkJTlW6kPw2tZFIB8Ys6gb23Tg4pM/s320/Screen+Shot+2017-08-04+at+14.31.29.png" width="320" height="142" data-original-width="599" data-original-height="265" /></a></div>
</div>
<center>
MacroBase System Architecture
</center>
<div>
<br/>Users can engage at three interface levels with MacroBase:
<ul>
<li>Basic: Web based graphical user interface. This one is an easy interface.</li>
<li>Intermediate: Custom pipelines configuring using Java.</li>
<li>Advanced: Custom dataflow operators using Java/C++.</li>
</ul>
</div>
<div>
These interfaces will enable users of varying skill levels to quickly obtain the initial results and further improve result quality by iteratively refining their analyses. Users can highlight the key performance metrics (like, power drain, latency) and metadata attributes (like, hostname, device id). MacroBase reports explanations of the abnormal behaviour. For example, MacroBase may report that queries running on host 5 are 10 times more likely to experience high latency than the rest of the cluster. MacroBase is currently doing mostly anomaly or outlier detection, it is not doing any deep machine learning training.<br/><br/>
</div>
<div>
<b>Conclusion</b><br/><br/>
</div>
<div>
Today we collect large volume of data in analytical platform. Some of these data are never read. Sometimes we may go back and analysis these data to find the root cause of the problem after it happened. Moreover, tools that we use to do these kinds of analysis are not easily accessible and process is time consuming. I think, these design principles provide good guidance which can be used to design and build a new generation analytics engine which can process huge volume of data and produce good quality output in timely manner.<br/><br/>
</div>
<div>
More on this can be found:
<ul>
<li><a href="http://www.bailis.org/papers/macrobase-sigmod2017.pdf">MacroBase: Prioritizing Attention in Fast Data</a></li>
<li><a href="https://github.com/stanford-futuredata/macrobase">https://github.com/stanford-futuredata/macrobase</a></li>
</ul>
</div>
<br/>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-69357290987894889502017-08-03T15:20:00.000-07:002017-08-03T15:20:06.506-07:00Hypermedia and how to document it effectively<div dir="ltr" style="text-align: left;" trbidi="on">
<iframe src="//www.slideshare.net/slideshow/embed_code/key/NQEf69TOIx9eD3" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/sanroy/hypermedia-api-and-how-to-document-it-effectively" title="Hypermedia API and how to document it effectively" target="_blank">Hypermedia API and how to document it effectively</a> </strong> from <strong><a target="_blank" href="https://www.slideshare.net/sanroy">Sanjoy Kumar Roy</a></strong> </div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-44812083817103855472017-08-03T15:11:00.000-07:002017-08-03T15:11:11.491-07:00An introduction to OAuth 2<div dir="ltr" style="text-align: left;" trbidi="on">
<iframe src="//www.slideshare.net/slideshow/embed_code/key/3hqsZOCs8oEj4b" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/sanroy/an-introduction-to-oauth-2-78551046" title="An introduction to OAuth 2" target="_blank">An introduction to OAuth 2</a> </strong> from <strong><a target="_blank" href="https://www.slideshare.net/sanroy">Sanjoy Kumar Roy</a></strong></div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-25438482626332707082017-07-17T15:20:00.000-07:002017-07-17T15:20:42.296-07:00Transaction<div dir="ltr" style="text-align: left;" trbidi="on">
<iframe src="//www.slideshare.net/slideshow/embed_code/key/h63fdJy0ogFRZ5" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/sanroy/transaction-77973678" title="Transaction" target="_blank">Transaction</a> </strong> from <strong><a target="_blank" href="https://www.slideshare.net/sanroy">Sanjoy Kumar Roy</a></strong> </div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-37211676035603090812016-05-07T03:50:00.001-07:002016-05-07T03:50:09.213-07:00Microservice Architecture Design Principles<div dir="ltr" style="text-align: left;" trbidi="on">
<iframe src="https://www.slideshare.net/slideshow/embed_code/key/jiub9vIMCkZ66A" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>
<br /></div>Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-12253427938621619152016-04-17T12:57:00.000-07:002016-04-21T12:25:02.322-07:00Raft Consensus Algorithm<div dir="ltr" style="text-align: left;font-family: Georgia,"Times New Roman",serif;font-size: 1.2em;" trbidi="on">
<div style="line-height: 150%; margin-top: 10px;">
Imagine we have a single node database server that stores a single value. We also have a client that can send a value to the database server.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEje1jk3iTviZf3NJ9RqOJ1iFkxPXKS0GaV9qt9AWKXvP59dg815vK4DGig02xpZTlKFgMZwS6RZUf1O1o2V2zXX114TcbQeEFd4T57QOWOAa8uzgoNJbx4SlG8wURrOc3Y1obBFNIvhbGQ/s1600/RaftSingleNode.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEje1jk3iTviZf3NJ9RqOJ1iFkxPXKS0GaV9qt9AWKXvP59dg815vK4DGig02xpZTlKFgMZwS6RZUf1O1o2V2zXX114TcbQeEFd4T57QOWOAa8uzgoNJbx4SlG8wURrOc3Y1obBFNIvhbGQ/s320/RaftSingleNode.jpg" width="320" /></a>
</div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
Coming to agreement (or consensus) on that value is easy with one node. But how do we come to consensus if we have more than one node?
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCAyP5xdyRZsZ-V4FA_fw_UL8IYsb1kHTPsSkFvYGcbtjwB-BjgDRxoYqwMlG7FMOoASwneyMKCP647sjculT453iBG0-BL7R5FA1kPNnWTuohOXiMo4QsKuG1RfFEpW1l7C_BhUA_OcY/s1600/RaftMultipleNodes.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCAyP5xdyRZsZ-V4FA_fw_UL8IYsb1kHTPsSkFvYGcbtjwB-BjgDRxoYqwMlG7FMOoASwneyMKCP647sjculT453iBG0-BL7R5FA1kPNnWTuohOXiMo4QsKuG1RfFEpW1l7C_BhUA_OcY/s320/RaftMultipleNodes.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
Here we need to use distributed consensus. Distributed consensus (i.e. protocols) allows nodes in an unreliable distributed system to agree on an ordering of events. Raft is a protocol for implementing distributed consensus.
</div>
<div style="line-height: 150%; margin-top: 10px;">
Distributed consensus is typically framed in the context of a replicated state machine, drawing a clear distinction between the state machine (the fault tolerant application), the replicated log and the consensus module.
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 10px;">
Replicated State Machine
</div>
<div style="line-height: 150%; margin-top: 10px;">
Replicated state machines are typically implemented using a replicated log. Each server stores a log containing a series of commands, which its state machine executes in order. Each log contains the same commands in the same order, so each state ma- chine processes the same sequence of commands. Since the state machines are deterministic, each computes the same state and the same sequence of outputs.
</div>
<div style="font-size: 1em; line-height: 150%; margin-top: 10px;">
Consensus algorithm is responsible for keeping the replicated log consistent. The consensus module on a server receives commands from clients and adds them to its log. It communicates with the consensus modules on other servers to ensure that every log eventually contains the same requests in the same order, even if some servers fail. Once commands are properly replicated, each server’s state machine processes them in log order, and the outputs are returned to clients. As a result, the servers appear to form a single, highly reliable state machine.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiB1GHUbRDhjjhmyiPwtf1MYYcsbj-sq3d2P_tq1NpuCAUCya9_I0f-0CnVtmSlV-Ymblroph4UL9TxHFgfzcLtlVRYh8nSeyDIdvl5juzXzJSEqo4m_bgXsK98bwizbEELEW0UvRjKYdk/s1600/ReplicatedStateMachine.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiB1GHUbRDhjjhmyiPwtf1MYYcsbj-sq3d2P_tq1NpuCAUCya9_I0f-0CnVtmSlV-Ymblroph4UL9TxHFgfzcLtlVRYh8nSeyDIdvl5juzXzJSEqo4m_bgXsK98bwizbEELEW0UvRjKYdk/s320/ReplicatedStateMachine.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
Replicated state machines are used to solve variety of fault tolerant problems in distributed system.
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 10px;">
Properties of consensus algorithm
</div>
<div style="line-height: 150%; margin-top: 10px;">
<ol>
<li>
They never return an incorrect result under all non-Byzantine conditions including network delays, partition, packet loss, duplication, reordering
</li>
<li>
They are fully functional as long as any majority of servers are operational and can communicate with each other and with the clients. For example, a cluster of five servers can tolerate the failure of any two servers
</li>
<li>
They do not depend on the timing to ensure the consistency of the logs
</li>
<li>
In the common case, a command can complete as soon as a majority of the servers has responded to a single round of remote procedure calls. A minority of slow servers do not have impact on the overall system performance.
</li>
</ol>
</div>
<div style="line-height: 150%; margin-top: 10px;">
Now we will see how Raft works. Before that we need to make ourself familiar with some Raft concepts.
</div>
<div style="line-height: 150%; margin-top: 10px;">
Raft is a consensus algorithm for managing replicated log. Raft uses strong leadership. At first Raft selects a leader with the complete responsibility for managing the replicated log. The leader accepts the log entries from the clients, replicates the log entries to the other servers and tells them when it is safe to apply these log entries to their state machines. When a leader fails and becomes disconnected from other servers, a new leader gets elected. Clients are external to the system and must contact the leader directly to communicate with the system.
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 10px;">
Raft cluster
</div>
<div style="line-height: 150%; margin-top: 10px;">
Typically Raft cluster is set up using five nodes, so that the system can tolerate two failures.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjW0Q29JSeGTVpB8401sfBCCvTlc7eZMOqvaJa6NmSlBaV8YGsx-78XZlO1gXtCVNGKcboxGRukKwRvMXElCDPIEqsjzr8ApdaBXD6x5lsfVS5Umwqddmqm-WGCkBlG5wEsBczbFZXYqew/s1600/RaftCluster.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjW0Q29JSeGTVpB8401sfBCCvTlc7eZMOqvaJa6NmSlBaV8YGsx-78XZlO1gXtCVNGKcboxGRukKwRvMXElCDPIEqsjzr8ApdaBXD6x5lsfVS5Umwqddmqm-WGCkBlG5wEsBczbFZXYqew/s1600/RaftCluster.jpg" /></a></div>
<br /></div>
<div style="font-weight:bold; line-height: 150%; margin-top: 10px;">
Server States
</div>
<div style="line-height: 150%; margin-top: 10px;">
According to Raft protocol, a node can be one of three states:
<br />
<ul style="line-height: 150%;">
<li>
Follower: A follower is a passive node. It does not issue any request on its own but simply responds to the requests from the leader and the candidates
</li>
<li>
Candidate: A candidate is an active node which is attempting to become a Leader. It initiates a request for votes from other nodes. A candidate that receives votes from a majority of the full cluster becomes the new leader
</li>
<li>
Leader: Leader node is an active node which is currently leading the cluster. This node handles requests from clients. If a client contacts a follower, it redirects the client to the leader</li>
</ul>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlJbi7YI75K4YWk6Fq0oAgg9sIs-TipvI6Wvx3CnwA1BaZO5202qo8Wyt3GZo8mG15Kni7Xu_N3K806XEOkPiokWF9EBEzOiQP7EOHZH8yvPhq6ovryLc-xmyH5PKRIj6nMpA7yiPGb6g/s1600/RaftServerStates.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlJbi7YI75K4YWk6Fq0oAgg9sIs-TipvI6Wvx3CnwA1BaZO5202qo8Wyt3GZo8mG15Kni7Xu_N3K806XEOkPiokWF9EBEzOiQP7EOHZH8yvPhq6ovryLc-xmyH5PKRIj6nMpA7yiPGb6g/s320/RaftServerStates.jpg" width="320" /></a></div>
</div>
<div style="line-height: 150%; margin-top: 10px;">
How does Raft detect obsolete information such as stale leader? Raft detects this using a concept called term.
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 10px;">
Term
</div>
<div style="line-height: 150%; margin-top: 10px;">
Term is an arbitrary length of time. Terms are numbered with consecutive integers. Terms act as logical clock in Raft. Each term begins with an election in which one or more candidates attempt to become leader. If a candidate wins the election, then it serves as leader for the rest of the term. There may be a situation in which a term ends with no leader, in this case a new term begins with a new election. Raft ensures that there is at most one leader in a given term.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2f_M-DxOaRx-87f6lWZe4VbH04lhwX4CXGUGUC7DZVxQyCjFpoIlypd9Qc3Qas9TwcxL7WKJUY62puc3ik2jbsknjAc6UrP-DPzHo-D5Fqrl4PEuHdem7fIlBpj8m7nBdop-fGqo7FEY/s1600/RaftTerms.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2f_M-DxOaRx-87f6lWZe4VbH04lhwX4CXGUGUC7DZVxQyCjFpoIlypd9Qc3Qas9TwcxL7WKJUY62puc3ik2jbsknjAc6UrP-DPzHo-D5Fqrl4PEuHdem7fIlBpj8m7nBdop-fGqo7FEY/s320/RaftTerms.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
Each server stores its perspective of the term in persistent storage, which increases monotonically over time. A server’s term is only updated when it starts (or restarts) an election, or when it learns from another server that its term is out of date. All messages include the source server’s term. The receiving server checks it, with two possible outcomes: if the receiver’s term is larger, a negative response is sent, while if the receiver’s term is smaller than or equal to the source’s, its term is updated before parsing the message.
</div>
<div style="font-weight:bold; line-height: 150%; margin-top: 10px;">
Types of messages
</div>
<div style="line-height: 150%; margin-top: 10px;">
Raft servers communicate with each other using remote procedure calls (RPCs). There are three types of message used in Raft:
<br />
<ul>
<li>
RequestVote: this message is used by the candidates during the election.
</li>
<li>
AppendEntries: this message is initiated by the leader to replicate the log entries and to provide a form of heartbeat to the followers.
</li>
<li>
InstallSnapshot: this message is used by the leader to send a snapshot of it’s log to the followers that are too far behind.
</li>
</ul>
</div>
<div style="font-weight:bold; line-height: 150%; margin-top: 10px;">
Leader Election
</div>
<div style="line-height: 150%; margin-top: 10px;">
In Raft, there are two timeout settings which control elections. First is the election timeout.
The election timeout is the amount of time a follower waits until becoming a candidate. The election timeout is randomized to be between 150ms and 300ms. After the election timeout the follower becomes a candidate and starts a new election term and votes for itself and sends out RequestVote messages to other servers.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhX8rZPgRh_OCou2orweoB0oFB8m8lwNFoxsxP-HMPpDZw2ECkJqeFVB8TpJMIvEjmKNq2fBo3RRBX90sw-z0VpiJsSWGDD-El9V1Zy0d60jmj4tRblnYmv26DQ8FkimrMO8OFzQlq0ktc/s1600/LeaderElection1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhX8rZPgRh_OCou2orweoB0oFB8m8lwNFoxsxP-HMPpDZw2ECkJqeFVB8TpJMIvEjmKNq2fBo3RRBX90sw-z0VpiJsSWGDD-El9V1Zy0d60jmj4tRblnYmv26DQ8FkimrMO8OFzQlq0ktc/s320/LeaderElection1.jpg" width="320" /></a></div>
<br /></div>
<div style="font-size: 1em; line-height: 150%; margin-top: 10px;">
If the receiving server hasn't vote yet in this term then it votes for the candidate and the server resets it's election timeout. Once a candidate has a majority of votes it becomes leader. The leader begins sending out AppendEntries messages to its followers. These messages are sent in intervals specified by the heartbeat timeout. Followers then respond to each AppendEntries message. This election term will continue until a follower stops receiving heartbeats and become a candidate. Requiring a majority of votes guarantees that only one leader can be elected per term.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_kPR2TTHFP07LCNgWq4hKmDA3LluMnbgwadb_I-E-BovDRtL6zRXY1iVW7aIj33TkQ88YVcC1bYW-OdBOHmV5bYqXV_WqIH6O7yE87QmgfAbzqhH4FrfVlLhOMr0E3tZfWpIb7nB2PsI/s1600/LeaderElection2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_kPR2TTHFP07LCNgWq4hKmDA3LluMnbgwadb_I-E-BovDRtL6zRXY1iVW7aIj33TkQ88YVcC1bYW-OdBOHmV5bYqXV_WqIH6O7yE87QmgfAbzqhH4FrfVlLhOMr0E3tZfWpIb7nB2PsI/s320/LeaderElection2.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
This process is called Leader Election. All the changes to the system will now go through the leader.
</div>
<div style="line-height: 150%; margin-top: 10px;">
There may be a situation when a candidate neither wins nor loses the election. For example, two followers become candidates at the same time and votes could be split so that no candidate obtains a majority. When this happens, each candidate will time out and start a new election by incrementing its term and initiating another round of RequestVote messages. Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly.
</div>
<div style="font-weight:bold; line-height: 150%; margin-top: 10px;">
Log Replication
</div>
<div style="line-height: 150%; margin-top: 10px;">
Once a leader is elected, the leader needs to replicate all changes to the system to all servers. This is done by using the same AppendEntries message that are used for heartbeats.
</div>
<div style="line-height: 150%; margin-top: 10px;">
First a client sends a change to the leader. The change is appended to the leader's log. This log entry is currently uncommitted so it will not update the leader server value. Leader then sends the change to the followers on the next heartbeat.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR6ZRsdzSlg4yL0EXEntYQQ_pfOFgsgndarf7xvXzIBYfquQvKF7_gfkca_M6KyrlYSkfJwibzUQn7MehBosbfZ-0G3ra9Ia-yBCvuawXyD-I4GzZvQsv-tBEVYjLpF22p7hr-wCoPtHQ/s1600/LogRep1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR6ZRsdzSlg4yL0EXEntYQQ_pfOFgsgndarf7xvXzIBYfquQvKF7_gfkca_M6KyrlYSkfJwibzUQn7MehBosbfZ-0G3ra9Ia-yBCvuawXyD-I4GzZvQsv-tBEVYjLpF22p7hr-wCoPtHQ/s320/LogRep1.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
The change gets replicated in the followers' logs.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5WNBQt8s9BsvJatChYCKbyhK8GVeJ5CSFrywKa7LbJMdGE2dXyQoeq4t0b0_NBe8onfpbvwji-0I5Jp3k-OU1xFbnm_PsaautuMiRNtGbNofSG36UWz08si7qgmFxu2g-CFx10Bt3QuE/s1600/LogRep2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5WNBQt8s9BsvJatChYCKbyhK8GVeJ5CSFrywKa7LbJMdGE2dXyQoeq4t0b0_NBe8onfpbvwji-0I5Jp3k-OU1xFbnm_PsaautuMiRNtGbNofSG36UWz08si7qgmFxu2g-CFx10Bt3QuE/s320/LogRep2.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
An entry is committed on the leader server once a majority of followers acknowledge it.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOwkZLcLcoxOuipXBPAk-NXewI7DA4u9IBT2NKiU-VIto4cXsBO09SzwwL0GBBEcSJbKto2Pcnl2Au0Gb59-QLgxuM64dH8dn5LTJQCxcuTSjLdR0TuH9gzkVffy4caH0d7Uf4DhXJDv0/s1600/LogRep3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOwkZLcLcoxOuipXBPAk-NXewI7DA4u9IBT2NKiU-VIto4cXsBO09SzwwL0GBBEcSJbKto2Pcnl2Au0Gb59-QLgxuM64dH8dn5LTJQCxcuTSjLdR0TuH9gzkVffy4caH0d7Uf4DhXJDv0/s320/LogRep3.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
After receiving the acknowledgement from the followers, leader commits the entry.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixwXqCmgzWtlGNJaJ8cjENAilJWEqiUoh__1x-7bCx-BBAl_jCJPwJKjd50OezCtMeLRGwQU5S-KBB-wxGrBhcJeC-7skZ3WJz6J-MYZgBqsutoCZIm-3QHf1b6-WRjFKLXBeULDfPoXM/s1600/LogRep4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixwXqCmgzWtlGNJaJ8cjENAilJWEqiUoh__1x-7bCx-BBAl_jCJPwJKjd50OezCtMeLRGwQU5S-KBB-wxGrBhcJeC-7skZ3WJz6J-MYZgBqsutoCZIm-3QHf1b6-WRjFKLXBeULDfPoXM/s320/LogRep4.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
The leader then notifies the followers that the entry is committed.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7UE_I8PvjbiAnk4_AhT_u3uk02zjmYpBHQnPDaN0y4rDwKe_gXFSexl4mlm-LaiM2RqlS2xrmhdaM0B8EirZOXGXDrHcXQ78cwaULKvBn08vHj94lKHUxeMPPC_v7zW_kvK4WAmq3tq4/s1600/LogRep5.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7UE_I8PvjbiAnk4_AhT_u3uk02zjmYpBHQnPDaN0y4rDwKe_gXFSexl4mlm-LaiM2RqlS2xrmhdaM0B8EirZOXGXDrHcXQ78cwaULKvBn08vHj94lKHUxeMPPC_v7zW_kvK4WAmq3tq4/s320/LogRep5.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
The cluster has now come to consensus about the system state and leader sends the response to the client.
<br /><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ4_X7CoxSrGwQV9jJB6TlL9msKH2C6OTEGV-fzjL6O1Xh99MxClacTLK4r7K7HXLIEmS71IAppVy5-aRqLYfYppHRfhWU6kQvN2QRQy9ViZAxWZnh1oJlOP67eymF-CK_1VQRgK1ZhAk/s1600/LogRep6.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ4_X7CoxSrGwQV9jJB6TlL9msKH2C6OTEGV-fzjL6O1Xh99MxClacTLK4r7K7HXLIEmS71IAppVy5-aRqLYfYppHRfhWU6kQvN2QRQy9ViZAxWZnh1oJlOP67eymF-CK_1VQRgK1ZhAk/s320/LogRep6.jpg" width="320" /></a></div>
<br /></div>
<div style="line-height: 150%; margin-top: 10px;">
This process is called Log Replication.
</div>
<div style="line-height: 150%; margin-top: 10px;">
Now, consider the case that some messages have been lost or servers have failed and recovered, leaving some logs incomplete. It is the responsibility of the leader to fix this by replicating its log to all other servers. When a follower receives an AppendEntries message, it contains the log index and term associated with the previous entry. If this does not match the last entry in the log, the follower sends an unsuccessful response to the leader. The leader is now aware that the follower's log is inconsistent and needs to be updated. The leader decrements the previous log index and term associated with that server. The leader keeps dispatching the AppendEntries message, adding entries to the log until the follower server replies with success and is therefore up to date.
</div>
<div style="line-height: 150%; margin-top: 10px;">
Each server keeps its log in persistent storage, including a history of all commands and their associated terms. Each server also has a commit index, which represents the most recent command to be applied to the replicated state machine. When the commit index is updated, the server passes all commands between the new and old commit index to the local application state machine.
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 10px;">
Safety
</div>
<div style="line-height: 150%; margin-top: 10px;">
Raft uses State Machine Safety properties:
<ul>
<li>
Election Safety: at most one leader can be elected in a given term.
</li>
<li>
Leader Append-Only: a leader never overwrites or deletes entries in its log; it only appends new entries.
</li>
<li>
Log Matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index.
</li>
<li>
Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms.
</li>
<li>
State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index.
</li>
</ul>
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 10px;">
Conclusion
</div>
<div style="line-height: 150%; margin-top: 10px;">
The authors of Raft focus on the understandability. Raft is designed to be easy to understand. According to the authors:
</div>
<div style="line-height: 150%; margin-top: 15px;font-size:1.5em;border-left: 10px solid #BDBDBD;padding-left:20px;">
<i>
"...our most important goal—and most difficult challenge—was understandability. It must be possible for a large audience to understand the algorithm comfortably. In addition, it must be possible to develop intuitions about the algorithm, so that system builders can make the extensions that are inevitable in real-world implementations."
</i>
</div>
<div style="line-height: 150%; margin-top: 10px;">
I like the effort Raft's authors put to make the algorithm understandable. They have given many talks and created course materials. All these you can find here <a href="https://raft.github.io/">https://raft.github.io/</a>
</div>
<div style="font-weight:bold;line-height: 150%; margin-top: 15px;">
References
</div>
<div style="line-height: 150%; margin-top: 10px;">
<ul>
<li>
<a href="http://ramcloud.stanford.edu/raft.pdf">Raft paper</a> for detail explanation
</li>
<li>
<a href="http://www.cl.cam.ac.uk/~ms705/pub/papers/2015-osr-raft.pdf">Raft Refloated: Do We Have Consensus?</a> by Heidi Howard, Malte Schwarzkopf, Anil Madhavapeddy, and Jon Crowcroft
</li>
<li>
<a href="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-857.html">ARC: Analysis of Raft Consensus</a> by Heidi Howard
</li>
</ul>
</div>
</div>Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-78217560542987850382015-05-20T14:41:00.002-07:002015-05-20T14:43:54.234-07:00My talk on agile architecture in Agile Manchester 2015<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Georgia, Times New Roman, serif;">I have given a talk on agile architecture in Agile Manchester 2015. Here are the slides:</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><iframe src="https://www.slideshare.net/slideshow/embed_code/key/2jIGmSjxoqk9TP" width="476" height="400" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe></span></div>Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-73914665823570916612015-03-01T11:37:00.002-08:002015-03-01T11:37:51.254-08:00Modularity<div dir="ltr" style="text-align: left;font-family: Georgia,"Times New Roman",serif;font-size: 1.2em;" trbidi="on">
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
One of the challenging part of any software design is to manage complexity. By managing complexity effectively we can respond to change quickly, improve time to market, reduce cost of change and also improve stability of the system. Modularity is useful for managing quality and complexity in software systems.
</div>
<div style="font-style:italic;margin-top:10px;font-size: 1em;line-height:150%;">
<i>Modularity is the concept of breaking down a complex problem into smaller, simpler and more manageable problems.</i>
</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
In software design, modularity refers to the extent to which a software application may be divided into smaller modules. The goals of structuring an application or system in modules is to be able to develop, test and deploy them separately. Modularity also enforces separation of concerns by vertically or horizontally partitioning a system. It keeps a clear separation between business functionality and data/information.
</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
One way to achieve modularity is <i>functional decomposition</i> which basically means that each module or sub-system has a clear domain responsibility and is part of a larger eco-system. Each sub-system may consist of one or more independent modules.
</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
Benefits of functional decomposition:
<ul style="list-style-type: square;">
<li>Increase quality and reduce complexity</li>
<li>Parallel development and roll-outs</li>
<li>Horizontal load balancing of functionality</li>
<li>Reduced dependencies between different functional areas</li>
<li>Functional areas can scale separately and on demand</li>
<li>Asynchronous workflows that increase availability and evenly balance out peak loads</li>
<li>Smaller modules most of the time brings smaller data sets. This helps to reduce database size, database server workloads and simplified ORM mappings.</li>
</ul>
</div>
<div>
Also it is really difficult to become agile or apply agile methodologies without a modular system. A modular system helps you to replace, upgrade or throw away certain part of the system without affecting much the rest of the system.
</div>
</div>Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-84815998307072986872015-02-15T15:22:00.000-08:002015-02-15T15:22:22.602-08:00Eventual Consistency<div dir="ltr" style="text-align: left;font-family: Georgia,"Times New Roman",serif;font-size: 1.2em;" trbidi="on">
<div style="border-top-style: solid;border-top-color: maroon;border-bottom-style: solid;border-bottom-color: maroon;padding-top:10px;padding-bottom:10px;">Introduction</div>
<div style="font-style:italic;margin-top:10px;font-size: 1em;line-height:150%;font-color:maroon;">
Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.<sup>[1]</sup>
</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
To ensure high availability and scalability, distributed system keeps copies of its data across multiple machines (sitting in different data centers). When a change happened to a data item on one machine, that change has to be propagated to the other replicas. The change propagation will not be happened instantly since there is a network delay. This interval of time allows <i><b>window of inconsistency</b></i> during which some of the copies will have the most recent change, but others won't. In other words, the copies will be mutually inconsistent. However, the change will eventually be propagated to all the copies. Hence it is called <i><b>eventual consistency</b></i>.
</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
When we talk about eventual consistency, we also need to mention CAP Theorem.
</div>
<div style="margin-top:15px;border-top-style: solid;border-top-color: maroon;border-bottom-style: solid;border-bottom-color: maroon;padding-top:10px;padding-bottom:10px;">CAP Theorem</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
CAP Theorem was presented by Eric Brewer in a keynote address to PODC (Principles of Distributed Computing) in 2001. CAP Theorem identifies three important properties of distributed system: <b>Consistency</b>, <b>Availability</b> and <b>Partition Tolerance</b>. Out of these three properties, only two can be achieved at a given time.
</div>
<div style="margin-top:10px;font-size: 1em;line-height:150%;">
Since it is impossible simultaneously to achieve always-on experience (availability) and reading the latest written version of data from a distributed database (consistency) in the presence of partial failure (partitions), distributed system architects sacrifice "strong" consistency to ensure availability and partition tolerance. In other way it can be said that they use weaker models and <i>eventual consistency</i> is the most notable one.
</div>
<div style="margin-top:15px;border-top-style: solid;border-top-color: maroon;border-bottom-style: solid;border-bottom-color: maroon;padding-top:10px;padding-bottom:10px;">Examples</div>
<div style="margin-top:10px;">
<ul style="list-style-type: square;">
<li>DNS</li>
<li>Asynchronous master/slave replication on an RDBMS</li>
<li>Caching in front of relational databases</li>
<li>NoSQL databases</li>
</ul>
</div>
<div style="margin-top:15px;border-top-style: solid;border-top-color: maroon;border-bottom-style: solid;border-bottom-color: maroon;padding-top:10px;padding-bottom:10px;">Variations of Eventual Consistency</div>
<div style="margin-top:10px;">
<ul style="list-style-type: square;">
<li><i><b>Causal consistency</b></i> If process A has communicated to process B that it has updated a data item, a subsequent access by process B will return the updated value, and a write is guaranteed to supersede the earlier write. Access by process C that has no causal relationship to process A is subject to the normal eventual consistency rules. Eventual Consistency does not say anything about the ordering of operations where as causal consistency ensures that operations appear in the order the user intuitively expects. It enforces a partial order over operations.</li>
<li><i><b>Read-your-writes consistency</b></i> This is an important model where process A, after it has updated a data item, always accesses the updated value and will never see an older value. This is a special case of the causal consistency model.</li>
<li><i><b>Session consistency</b></i> This is a practical version of the previous model, where a process accesses the storage system in the context of a session. As long as the session exists, the system guarantees read-your-writes consistency. If the session terminates because of a certain failure scenario, a new session needs to be created and the guarantees do not overlap the sessions.</li>
<li><i><b>Monotonic read consistency</b></i> If a process has seen a particular value for the object, any subsequent accesses will never return any previous values.</li>
<li><i><b>Monotonic write consistency</b></i> In this case the system guarantees to serialize the writes by the same process. Systems that do not guarantee this level of consistency are notoriously hard to program.</li>
</ul>
</div>
<div style="margin-top:10px;">
A big advantage of Eventual Consistency is that it is fairly straightforward to implement. To ensure convergence, replicas must exchange information with one another about which writes they have seen. This information exchange process is often called anti-entropy. There are different ways to achieve this. One simple solution is to use an asynchronous all-to-all broadcast. When a replica receives a write to a data item, it immediately responds to the user, then, in the background, sends the write to all other replicas, which in turn update their locally stored data items. In the event of concurrent writes to a given data item, replicas deterministically choose a "winning" value, often using a simple rule such as "last writer wins" (using a clock value embedded in each write).
</div>
<div style="margin-top:10px;">
Even though Eventual Consistency does not make any safety guarantee, eventual consistent data store are widely deployed. Because it is "good enough", given its latency and availability benefits.
</div>
<div style="margin-top:15px;padding-top:10px;padding-bottom:10px;">References</div>
</div>
<div style="margin-top:10px;">
<ul style="list-style-type: square;">
<li>http://en.wikipedia.org/wiki/Eventual_consistency</li>
<li>http://www.allthingsdistributed.com/2008/12/eventually_consistent.html</li>
<li>Don't Settle for Eventual Consistency- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen</li>
<li>Eventual Consistency Today: Limitations, Extensions, and Beyond - Peter Bailis and Ali Ghodsi</li>
</ul>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-62312214774227615942014-09-07T14:04:00.002-07:002014-09-07T14:04:17.227-07:00TierCompilation<div dir="ltr" style="text-align: left;font-family: Georgia,"Times New Roman",serif;font-size: large;" trbidi="on">
<div>
TierCompilation is a mix of client (C1) and server (C2) compilation. With tiered compilation, code is first compiled by the client compiler. When it becomes hot, it is recompiled by the server compiler.
</div>
<div style="margin-top:10px;">
The goal of the TierCompilation is to get best of both client (C1) and server (C2) compilers. Client compiler begins compiling sooner than the server compiler does. So client compiler is faster than the server in code execution. But client compiler provides less optimization and code quality is not as good as server generated one.
</div>
<div style="margin-top:10px;">
Though server compiler is slow, it provides better quality code. Server compiler waits to gain the knowledge about the code and uses the knowledge to optimize the code. It inlines much more aggressively. Code produced by the server compiler is faster than that produced by client compiler. TierCompilation takes the advantages of both client (fast startup) and server compiler (peak performance).
</div>
<div style="margin-top:10px;">
In Java 7, <b>-XX:+TieredCompilation</b> flag needs to be used to enable TierCompilation. Make sure you specify the server compiler with the -server flag or by ensuring it is the default for the particular Java installation being used. In Java 8, TierCompilation is enabled by default.
</div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-83365713661587097022014-05-11T00:03:00.001-07:002014-05-11T00:05:30.579-07:00An introduction to G1 (Garbage First) Collector<div dir="ltr" style="text-align: left;" trbidi="on">
<iframe src="http://www.slideshare.net/slideshow/embed_code/34529769" width="476" height="400" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
<br /></div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-77465327263321947512014-03-23T08:30:00.000-07:002014-03-23T08:30:35.486-07:00Microservice architecture<div dir="ltr" style="text-align: justify;font-family: Georgia,"Times New Roman",serif;font-size: normal;" trbidi="on">
<div>
This year QCon 2014 London many speakers mention microservice architecture style. They talk about the problems they face with monolithic system and how they solve the problems using this type of architecture style. Martin Fowler together with James Lewis has written a <a href="http://martinfowler.com/articles/microservices.html" style="color:blue;">multiple part article</a> about the characteristics of microservice. Monolithic system has several disadvantages:
<ul>
<li>It is difficult to change.</li>
<li>It takes time to add new feature.</li>
<li>Difficult to test.</li>
<li>Deployment is a pain.</li>
<li>and many more</li>
</ul>
But it happens to every developer's life. Most of us are still going through this. Business people come to us to have something that may generate money for the business and we start with something small. We keep on adding new features to this system and over the time it becomes a monolithic system with all the characteristics mentioned above. There is another scenario - we spend several months to build something and then realize that this is not what business wants, wasting of valuable time and money. Microservice architecture can help us to overcome these situations.
</div><br/>
<div style="color:maroon;font-weight:bold;">
So what is microservice architecture?
</div><br/>
<div style="border-left:5px solid maroon;padding-left:10px;color:maroon;font-weight:italic;">
Microservice Architecture is a concept that aims to decouple a solution by decomposing functionality into discrete services.
</div><br/>
<div>
So rather than putting all functionalities in one application, you go for functional decomposition. Each of the function or capability of the application becomes a service. Each of the service becomes easier to understand, develop, test and deploy. These services can scale and evolve at their own pace.
</div><br/>
<div>
When developing application using microservice architecture style, we need to think about <span style="font-weight:bold;color:green;">business capabilities</span> of the application.
</div><br/>
<div style="border-top:2px solid maroon;border-bottom:2px solid maroon;text-align:center;padding:10px;">
Each capability represents a service and has its own bounded context.
</div><br/>
<div>
An example: Reward Management System
</div><br/>
<div>
Say ABC company wants to develop this system that gives rewards to their valuable customers when they opt into a offer after seeing it in a campaign and perform some activities written in that offer. Once they complete these activities, the system gives them rewards to make them happy.
</div><br/>
<div>
Now after performing some brain storming sessions, developers of that company come to a conclusion that the Reward Management System needs to have four capabilities:
<ul>
<li>Management of campaigns</li>
<li>Management of offers</li>
<li>Tracking of customers' activities</li>
<li>Give reward</li>
</ul>
</div><br/>
<div>
They can develop one application with all capabilities in one place. But since they want to give a try microservice architecture style, so they decide to develop individual service for each of the capabilities. So
<ul>
<li>Management of campaigns <span style="padding:0px 10px 0px 10px;color:green;"><i><b>becomes</b></i></span> Campaign Service</li>
<li>Management of offers <span style="padding:0px 10px 0px 10px;color:green;"><i><b>becomes</b></i></span> Offer Service</li>
<li>Tracking of customers' activities <span style="padding:0px 10px 0px 10px;color:green;"><i><b>becomes</b></i></span> Tracking Service</li>
<li>Give reward <span style="padding:0px 10px 0px 10px;color:green;"><i><b>becomes</b></i></span> Reward service</li>
</ul>
</div><br/>
<div style="color:maroon;font-weight:bold;">
Bounded Context
</div><br/>
<div style="border-left:5px solid maroon;padding-left:10px;color:maroon;">
A Bounded Context is an explicit boundary within which a domain model exists. Inside the boundary all terms and phrases of the Ubiquitous Language have specific meaning, and the model reflects the Language with exactness.
</div><br/>
<div>
In Reward Management System each of the capabilities has its own bounded context. Campaign Service only deals with campaigns: creating, updating, viewing, deleting, associating offer with a campaign and approving, publishing campaigns. Offer Service manages offers: again creating, updating, viewing, deleting offer. Tracking service tracks which offer a customer has chosen to opt in and how far she or he has completed her or his activities to get the reward. Reward service gives reward once customers complete their activities. All these services talk to each other to reach a common goal which is giving rewards to the customers.
</div><br/>
<div style="color:maroon;font-weight:bold;">
Benefits
</div><br/>
<div>
This type of architecture style has many benefits:
</div><br/>
<div style="font-weight:bold;">
<i>Reduce complexities to understand</i>
</div><br/>
<div>
In microservice architecture: <span style="color:green;"><i><b>One business capability = One Service</b></i></span>. So services are not suppose to be very big. Each of these service should not big enough to overcome the thinking process. <a href="http://bovon.org/index.php/archives/350" style="color:blue;">You need to understand what it does</a>. If any service goes beyond your thinking process, then it may be doing more than one thing. So it may be the right time to break it further. This reduces the complexities to understand a lot and eventually help in thinking process. In Reward Management System example, development team is not thinking the full system, most of the time they are thinking about individual service in isolation.
</div><br/>
<div style="font-weight:bold;">
<i>Easy to change</i>
</div><br/>
<div>
Since each service is small and focus on one capability of the system, you can change it quite easily. Even after development, if you find that it is not meeting business needs (based on different business relevant metrics), you can throw it and develop it again.
</div><br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeaRJTnm6X8q9sBs0SnEi0ElEmlCy2IFQUBRY-xCoMm72kqFi5r44oaM87RFgNmNk1IOXiirE3L-fMTqRCQZc95-XsCh2hUSijeYIzWBtVhZDIiYyvwxUuEKlIjQaOxfFRsAxBJbORjb0/s1600/Develop-Deploy-Monitor.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeaRJTnm6X8q9sBs0SnEi0ElEmlCy2IFQUBRY-xCoMm72kqFi5r44oaM87RFgNmNk1IOXiirE3L-fMTqRCQZc95-XsCh2hUSijeYIzWBtVhZDIiYyvwxUuEKlIjQaOxfFRsAxBJbORjb0/s320/Develop-Deploy-Monitor.png" /></a></div>
<div style="font-weight:bold;">
<i>Cross functional teams</i>
</div><br/>
<div style="border-left:5px solid maroon;padding-left:10px;color:maroon;">
A cross-functional team is a group of people with different functional expertise working toward a common goal. Cross functional team is a self-directed team. Assigning a task to a team composed of multi-disciplinary individuals increases the level of creativity and out of the box thinking. Each member offers an alternative perspective to the problem and potential solution to the task. <i><a href="http://en.wikipedia.org/wiki/Cross-functional_team" style="color:blue;">[wiki]</a></i>
</div><br/>
<div>
Microservice architecture opens the door for cross functional teams. These teams come with full range of skills for the development: user-experience, back-end development, testing, database etc. They design, build, test and deployed it. This team takes the full responsibility for the software in production.
</div><br/>
<div style="font-weight:bold;">
<i>Choice of technology stack</i>
</div><br/>
<div>
Same technology stack may not be suitable to solve all types of problem. In microservice architecture different team can select different technology stack for different services based on the problem and different other requirements. For example: one team can go typical java stack (Spring, Hibernate, RDBMS), other team can go for Node.js and NoSQL. There are also many other JVM based languages. Point is, in monolithic architecture, we often stuck with one set of technology whereas microservice architecture gives us many options to choose from. <a href="http://memeagora.blogspot.co.uk/2006/12/polyglot-programming.html" style="color:blue;">Polyglot programming</a> and <a href="http://martinfowler.com/bliki/PolyglotPersistence.html" style="color:blue;">polyglot persistence</a> are quite common and easy to do in microservice architecture.
</div><br/>
<div style="font-weight:bold;">
<i>Testing</i>
</div><br/>
<div>
Since services are small and doing one task, testing becomes easy in microservice architecture. We can add many automated tests that can be run using continuous delivery pipeline.
</div><br/>
<div style="font-weight:bold;">
<i>Real time Monitoring and Metrics</i>
</div><br/>
<div>
Real time Monitoring and Metrics are an important part of microservice architecture. We need to check both architectural elements (how many requests per second is the database getting) and business relevant metrics (such as how many customers opt into a offer per minute are received). Dashboards can be developed that show up/down status and a variety of operational and business relevant metrics of different services.
</div><br/>
<div style="color:maroon;font-weight:bold;">
Conclusion
</div><br/>
<div>
There are many other benefits of this architecture style that are not mentioned here. I personally feel that developers can make their professional life easier and less painful by using microservice architecture style. After all, simple solution matters.
</div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-68306622944862701222014-03-09T15:43:00.002-07:002014-03-09T15:45:02.168-07:00Major Java 8 Features<div dir="ltr" style="text-align: left;font-family: Georgia,"Times New Roman",serif;font-size: large;" trbidi="on">
<div>
Recently I have given a presentation on major Java 8 features. Here is the presentation
</div><br/>
<iframe src="http://www.slideshare.net/slideshow/embed_code/32102383" width="476" height="400" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-54085076539281156542014-01-31T14:59:00.001-08:002014-02-02T11:12:04.478-08:00Architectural Views : Context view<div dir="ltr" style="text-align: justify;font-size: large;font-family: Georgia,"Times New Roman",serif;" trbidi="on">
<div>
When describing the architecture of a software system it is useful to show how the system fits in the existing environment (people, systems and external entities with which it interacts). <b>Context view</b> helps us to do this.
</div>
<div style="margin:10px 0px 0px 0px">
Context view of a system defines the relationships, interactions, dependencies between the system and its environment.
</div>
<div style="margin:10px 0px 0px 0px">
The purpose of the context view is to share the big picture to all stackholders. It answers the questions like:
<ul>
<li>What does the system actually do from a functional point of view?</li>
<li>Who and what other systems are using it?</li>
<li>How is it related to auxiliary systems or services?</li>
</ul>
</div>
<div style="margin:10px 0px 0px 0px">
<b>Context diagram</b> is the key model within a context view. It is easy to draw. Just place the system in its environment by
relating it to the different actors (users and auxiliary systems) that it interacts with. A context diagram contains the below elements:
<ul>
<li><b>System</b>: the system that is going to be designed. Hide it's internal structure, treat it as a "black box"</li>
<li><b>External Entities</b>: these are the auxiliary systems, services, people and groups that the system interacts with</li>
<li><b>Connections</b>: theses are interfaces, protocols and connectors that link the external entities and the system being designed.</li>
</ul>
</div>
<div>
Here is a sample context diagram:<br/><br/>
</div>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiC_UjTzpaJC6-hwKNjB90IyAsEqcfJl370mXhUhPOrTbsSggawiO9svyEIdLeZwLNw7nzEEptnbRwun9yearotrQeO97d_N6duztSzpDP6vse6wwpxDwqQOyTzyGibVDiMTUB8QUkGegc/s1600/Context+view.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiC_UjTzpaJC6-hwKNjB90IyAsEqcfJl370mXhUhPOrTbsSggawiO9svyEIdLeZwLNw7nzEEptnbRwun9yearotrQeO97d_N6duztSzpDP6vse6wwpxDwqQOyTzyGibVDiMTUB8QUkGegc/s320/Context+view.png" /></a></div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-5205485426081525752013-11-26T14:39:00.000-08:002013-11-26T14:39:13.942-08:00Characteristics of a productive development team<div dir="ltr" style="text-align: left;font-size: large;font-family: Georgia,"Times New Roman",serif;" trbidi="on">
<div style="margin-bottom:10px;line-height:150%;">
In <a href="http://martinfowler.com/articles/agileFluency.html" style="color:maroon;">Agile Fluency</a> article, Diana Larsen and James Shore have mentioned that Agile teams develop through four distinct stages of fluency. They have defined one star, two star, three star and four star team based on the team Agile fluency. It is an interesting article. If you take all the attributes of one, two and three star team (I will not include four star as it is bit difficult to find a four star team) then you will see that they are the characteristics of a productive development team.
</div>
<div style="margin-bottom:10px;line-height:150%;">
A good and productive development team needs to follow some kind of Agile methods. It may be Scrum or Kanban. Ideally they have a product backlog. They use retrospectives to find out what has gone well and also what hasn't gone well in last iteration. They should keep on doing what has gone well and at same time try to avoid what hasn't gone well. They should write good user stories. A good user story solves many misunderstandings since it involves you to interact with different stackholders of your organization.
</div>
<div style="margin-bottom:10px;line-height:150%;">
Practicing Scrum or Kanban without test-driven development (TDD) is rubbish. Martin Fowler calls it <a href="http://martinfowler.com/bliki/FlaccidScrum.html" style="color:maroon;">FlaccidScrum</a>. One of the characteristics of a productive team is that they write <a href="http://www.martinfowler.com/bliki/SelfTestingCode.html" style="color:maroon;">self testing code</a>. It helps them to reduce their technical debt. Now many teams are also practicing behavior driven development (BDD) along with TDD, which is really good. Main point is, a good team should focus on values and qualities. It is about collective ownership or shared responsibilities.
</div>
<div style="margin-bottom:10px;line-height:150%;">
A productive team simply just don't stop here. They automate their build process. They follow practices that are required to do continuous integration effectively.
</div>
<div>
Another interesting attribute of a productive team is frequent release. They release as frequently as possible. It helps them to get rapid feedback on new features they have added. These feedbacks are important to build the right product.
</div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-47592251332600654952013-11-10T13:32:00.001-08:002013-11-10T13:51:25.104-08:00Media Types<div style="font-family: Georgia,"Times New Roman",serif; font-size: large;">
<div style="margin: 0px 0px 10px 0px;">
Contracts define how different parts of a distributed system should interact and media types play an important part in contracts.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgACYnCtxnbpfxizY3gTsNgCvtfLElhbllTREiP5bqfbiYSHCTIrBTy3inoz8Jsvpjd3st-8uu1mRxcEy6H22btLE_uNlhTKKTs4Zyjy5SBOtZBhF5uDfGbkC0Fd4l9Lr84TfsGeR7Ogg/s1600/contract.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="96" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgACYnCtxnbpfxizY3gTsNgCvtfLElhbllTREiP5bqfbiYSHCTIrBTy3inoz8Jsvpjd3st-8uu1mRxcEy6H22btLE_uNlhTKKTs4Zyjy5SBOtZBhF5uDfGbkC0Fd4l9Lr84TfsGeR7Ogg/s320/contract.png" width="326" /></a></div>
A media type is a combination of formats, processing model, and hypermedia controls.</div>
<div style="margin: 0px 0px 10px 0px;">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiEN90AtPNwHpgSVUJltqlDn2dO-EnyNwg9jyiOc0l1bgGdMCUDXHTpnBEiXAjGDVlpeuljOB_APa10VljGymMNtNuEax22ESGh-ZT-jjr31VFJ0WyAX_t4AAVQ0K4kAUA-AgukS-UdKg/s1600/Media+Type.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="119" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiEN90AtPNwHpgSVUJltqlDn2dO-EnyNwg9jyiOc0l1bgGdMCUDXHTpnBEiXAjGDVlpeuljOB_APa10VljGymMNtNuEax22ESGh-ZT-jjr31VFJ0WyAX_t4AAVQ0K4kAUA-AgukS-UdKg/s320/Media+Type.png" width="320" /></a></div>
</div>
<div style="margin: 0px 0px 10px 0px;">
We can use many standardized media type specifications or create new media types to fit our domain.
</div>
<div style="margin: 0px 0px 10px 0px;">
Standardized media types (e.g., XHTML or Atom) are well-defined and widely understood. Since many systems support these standardized media types, interoperability between them can be easily achieved by using them.
</div>
<div style="margin: 0px 0px 10px 0px;">
Custom media types help us to add application specific semantics on the top of generic media types handlers. Jim Webber mentioned a nice example in his <a href="http://jim.webber.name/2008/11/reflections-on-qcon-rest-track/">blog</a> about custom media type:
</div>
<div style="font-size: 0.9em; margin: 0px 0px 20px 0px;">
<i>
"For example, if you get an Atom representation, then you automatically understand (globally) how to interpret the atom:link links within; with a custom hypermedia type (... application/restbucks+xml) you automatically understand (within the Restbucks context) how to interpret links; but for application/xml you have no idea how to extract hypermedia controls unless you have some prior knowledge of the schema."
</i>
</div>
<div style="font-size: 0.9em; margin: 0px 0px 10px 0px;">
Reference: REST in Practice - Jim Webber, Savas Parastatidis and Ian Robinson
</div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-62328417789931467622013-10-19T16:34:00.002-07:002013-10-19T16:34:20.364-07:00Characteristics of different levels in Richardson Maturity Model<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Recently I am involved in building a Campaign Management System based on Richardson's Level 3 service definition. It's a new thing for my team and we are enjoying it. Leonard Richardson proposed a classification of RESTful web services in his <a href="http://www.crummy.com/writing/speaking/2008-QCon/act3.html" style="color: blue;" target="_blank">talk</a>. He mentioned four levels in his classification. Richardson evaluated service maturity based on three core technologies: URI, HTTP and Hypermedia. Each layer builds on the concepts and technologies of layers below.
</span></span></div>
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;"><br /></span></span>
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;"><br /></span></span>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio_XZn6WaJVxIc8aWb9tS-m45QHVmFeS1TgpeWPFbpWGjCed8i606IEz1LXZ1HRYhaAoIOp0ZcP3IzUlVYH3BqBR3rqSs4uoygPa9Ryk-9KIQng_4KanYPRakeSeRjSxbtguxPDnvD09I/s1600/RMM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio_XZn6WaJVxIc8aWb9tS-m45QHVmFeS1TgpeWPFbpWGjCed8i606IEz1LXZ1HRYhaAoIOp0ZcP3IzUlVYH3BqBR3rqSs4uoygPa9Ryk-9KIQng_4KanYPRakeSeRjSxbtguxPDnvD09I/s320/RMM.png" /></a></span></span></div>
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;"><br /></span></span>
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;"><br /></span></span>
<div style="line-height: 200%;">
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">So what are characteristics of these levels?
</span></span></div>
<div style="color: maroon; font-weight: bold; line-height: 200%;">
<span style="font-size: large;"><span style="font-family: Arial,"Times New Roman",serif;">Level 0 Services
</span></span></div>
<div>
<ul style="list-style-type: square;">
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">HTTP is used as a transport system to tunnel requests and responses.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">A single URI.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Use a single HTTP method (typically POST), ignore the rest of the HTTP verbs.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Examples: SOAP, XML-RPC and POX (Plain Old XML).</span></span></li>
</ul>
</div>
<div style="color: maroon; font-weight: bold; line-height: 200%;">
<span style="font-size: large;"><span style="font-family: Arial,"Times New Roman",serif;">Level 1 Services
</span></span></div>
<div>
<ul style="list-style-type: square;">
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Introduce Resource concept.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Employ many URIs and each URI acts as an entry point to a specific resource.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Still a single HTTP verb is used.</span></span></li>
</ul>
</div>
<div style="color: maroon; font-weight: bold; line-height: 200%;">
<span style="font-size: large;"><span style="font-family: Arial,"Times New Roman",serif;">Level 2 Services
</span></span></div>
<div>
<ul style="list-style-type: square;">
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">At this level services host many URI-addressable resources and also support several of the HTTP verbs on each exposed resources.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">The use of GET for requesting resource is important. HTTP defines GET as safe and idempotent operation. This property of GET help us to optimize the services. When a consumer of a resource uses GET, we know he does not want to modify it. We can use caching to store responses closer to our consumer. Subsequent requests will be served from the caches and that helps in improving the overall quality of the service.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Another important characteristic is the use of status codes. Services use different status codes to respond. When a resource is created, services respond with a <i>201 Created</i> whereas <i>409 Conflict</i> is used to tell that something has gone wrong.</span></span></li>
</ul>
</div>
<div style="color: maroon; font-weight: bold; line-height: 200%;">
<span style="font-size: large;"><span style="font-family: Arial,"Times New Roman",serif;">Level 3 Services
</span></span></div>
<div>
<ul style="list-style-type: square;">
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Support HATEOAS (Hypermedia As The Engine Of Application State)</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Now the representations contain URI links. These links may point to other resources (may be interesting to the consumers) or they may represent a transition to a possible future state of the current resource. One important thing to notice here - service tells the consumer what to do next through these links.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Here the consumer submits an initial request to the entry point of the service. The service handles the request and responds with a resource representation populated with links. The consumer chooses one of these links to transition to the next step in the interaction. Over the course of several such interactions, the consumer progresses toward its goal. In this way the distributed application's state gets changed.</span></span></li>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">Consumers in a hypermedia system cause state transitions by visiting and manipulating resource state.</span></span></li>
</ul>
</div>
<div style="font-family: Arial; font-size: 0.8em; line-height: 200%;">
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">References:
</span></span><br />
<ul style="list-style-type: square;"><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">
</span></span>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">REST in Practice - Jim Webber, Savas Parastatidis, Ian Robinson</span></span></li>
<span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;">
</span></span>
<li><span style="font-size: large;"><span style="font-family: Georgia,"Times New Roman",serif;"><a href="http://martinfowler.com/articles/richardsonMaturityModel.html" style="color: blue;" target="_blank">Martin Fowler's article on Richardson Maturity Model</a></span></span></li>
</ul>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-83653736962021982142013-06-22T16:00:00.000-07:002013-06-23T14:45:28.712-07:00Self Encapsulation<div dir="ltr" style="text-align: left;font-size: large;font-family: Georgia,"Times New Roman",serif;" trbidi="on">
<i><b>What is Self Encapsulation?</b></i>
<p>
Martin Fowler mentioned this in his <a href="http://martinfowler.com/bliki/SelfEncapsulation.html">bliki</a> : <br/><br/>
<i>"Self Encapsulation is designing your classes so that all access to data, even from within the same class, goes through accessor methods."</i>
</p>
<p>
This is also called Self Delegation. Take this simple Email example:
</p>
<pre>
<p style="background-color:#000000;font-family:courier;width:100%;color:white;font-size:0.9em;padding:5px 5px 5px 20px;">
public final class Email implements Serializable {
private static final long serialVersionUID = 1L;
private String emailAddress;
public Email(String anEmailAddress) {
super();
this.setEmailAddress(anEmailAddress);
}
public Email(Email anEmail) {
this(anEmail.getEmailAddress());
}
public String getEmailAddress() {
return this.emailAddress;
}
private void setEmailAddress(String anEmailAddress) {
if(anEmailAddress == null) {
throw new IllegalArgumentException
("Email address must not be null.");
}
if(anEmailAddress.length() == 0) {
throw new IllegalArgumentException
("Email address is required.");
}
if(!java.util.regex.Pattern.matches(
"\\w+([-+.']\\w+)*@\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*",
anEmailAddress)){
throw new IllegalArgumentException
("Email address format is invalid.");
this.emailAddress = anEmailAddress;
}
}
</p>
</pre>
<p>
In above example, constructor is delegating instance variable, <i>emailAddress</i> assignment to its own internal property setter. Here the setter method is not only setting the email address, but also performing an important <b><i>assertion</i></b>. It is providing a guard against invalid data. The self-encapsulation enables the setter method to determine the appropriate contractual condition for setting the email address. This is the advantage of using Self Encapsulation.
</p>
<br/><br/>
<i>References</i><br/>
1. http://martinfowler.com/bliki/SelfEncapsulation.html<br/>
2. Implementing Domain-Driven Design - Vaughn Vernon<br/>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-66225545066305944772013-06-08T14:22:00.000-07:002013-06-08T14:22:33.874-07:00Template I follow in writing user story<div dir="ltr" style="font-family: Georgia, "Times New Roman", serif;font-size:large;text-align: left;" trbidi="on">
<p>
User story one of the important item in your agile toolkit. A good user story will drive you to solve the right problem. I like writing user story. For me it is a discovery phase. Many unknown small but important requirements I have found when writing user story (some of my aha moments!!). Here is the template I follow when I write user story:
</p>
<p style="background-color:black;color:white;width:100%;padding:10px 10px 10px 10px;font-family:courier;font-size:0.8em;">
User Story:<br/><br/>
Title: (one line describing the story)<br/><br/>
As a <b>{role}</b> I want to <b>{action}</b> so that <b>{benefit}</b>
<br/><br/>
Notes (or Scopes):<br/><br/>
Add any relevant background information, specific algorithms or formulas, conversation etc.<br/><br/>
Acceptance Criteria:<br/><br/>
Given {context/system status}<br/>
when I {input/action}<br/>
then I should {result}<br/>
</p>
<p>
When I write the user story I think about the <b>benefit</b> or <b>business value</b> that I am going to add by implementing this feature. Even when my product owner writes the user story, I discuss the benefit of the feature with him. It gives me an opportunity to get a good understanding of the requirement. It also helps to find out the required <i>definition of done</i>.
</p>
<p>
<b>Role</b> helps me to find out my primary user. Sometimes I find it easily just by discussing with product owner, sometimes I talk to different stackholders to find it out. You may face the similar situation, just take time in your finding.
</p>
<p>
<b>Action</b> outlines the main flow of interaction which needs to be addressed.
</p>
<p>
<b>Notes or scopes</b> are optional for me. I may not need them always. When I work on a complex problem that requires further discussion. In these discussions I may come across many important information and references. I write them under this section for future references.
</p>
<p>
<b>Acceptance criteria</b> is another important part in my user story. In this section I write down the expected behaviour and corner cases. I review them with the product owner and tester. Acceptance criteria help me to reduce the ambiguity and at the same time I get the sense of done once I complete coding that meets the criteria. I write them in BDD (Given-When-Then) format. One thing to remember here, you may not have all accept criteria when implementation starts and also do not expect them to remain static. They may change and so adjust them accordingly.
</p>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-32350644313283952922013-06-05T15:59:00.000-07:002013-06-05T15:59:11.023-07:00DDD Note: Domain<div dir="ltr" style="text-align: left;font-family: Georgia,"Times New Roman",serif;font-size: large;" trbidi="on">
<p>
<b><i>Domain</i></b> is a sphere of knowledge or activity. For instance, you go to your favourite superstore to buy some products. Superstore buys these products from different sources and sells them to its customer. This superstore has its own unique business knowledge and way of doing things. This understanding and its methods for carrying out its activities or operations is its Domain. If this superstore requests you to develop a software for them, then you will be working in its domain.
</p>
<p>
It is rare to find a business that has only one functionality. There are different functions that make a business successful. It is always good to think about each of those business functions separately as a <b>Subdomain</b>. So a domain consists of multiple subdomains. In our superstore example, we can say that it has four subdomains: <i>Product Catelog, Orders, Invoicing, and Shipping</i>.
</p>
<p>
Some subdomains can be labeled as core domains. A <b>Core Domain</b> is a part of the business domain that is most important. The success of the business mainly depends on it. It deserves most of your attention and resources.
</p>
<p>
There are two other types of subdomains: <b>Supporting Subdomain</b> and <b>Generic Subdomain</b>. In many occasions you will find that there are services created or acquired to support the business. If it models some aspect of the business that is essential, yet not Core, it is a Supporting Subdomain. Supporting subdomains provide specialized functionality, whereas Generic Subdomain captures those activities that are not special but are required for the overall business solution.
</p>
<p>
Remember Supporting and Generic subdomains are not unimportant and they also deserve attention from you. But there is no need for the business to excel in these areas. It is the Core Domain that will provide distinct advantages to the business, hence it requires excellence in implementation.
</p>
<br />
<b>References: </b>
<ol>
<li>Domain-Driven Design - By Eric Evans</li>
<li>Implementing Domain-Driven Design - By Vaughn Vernon</li>
</ol>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-89400956376245009342013-05-31T13:36:00.001-07:002013-05-31T13:36:34.824-07:007 tips to make your tests readable<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;font-size: large;font-family: Georgia,"Times New Roman",serif;">
<p>
If you want to make your TDD sustainable, then please give importance to the readability of your tests. When a programmer reads your test, he or she needs to understand the purpose of test. No developer likes to stop and puzzle through a test to figure out what it does. You can reduce the cognitive load of your reader by making your tests readable. Here are some tips that you can use to improve the readability of your tests. I hope you will find them useful.
</p>
<i>Tip 1: Give importance to test name</i>
<p>
By choosing a right name for your test you are giving the first clue to your reader about the intention your test and how the target object is supposed to behave. Try to select a name that says something about the scenario and the expected behaviour.
</p>
<i>Tip 2: Structure your unit test</i>
<p>
Try to follow "Arrange, Act, Assert" pattern to structure your unit test. "Arrange, Act, Assert" basically means that you want to organize your tests such that you first arrange the objects used in the test, then trigger the action and make assertions about the outcome last. You can add whitespace in between these three segments to help others to understand your tests more easily. For example:
<pre>
<div style="width: 100%; background-color: black; color: white; font-size:0.9em; font-family: "Courier New";">
@Test
public void shouldFindCustomerByUsername(){
//Arrange
when(customerDaoMock.findByUsername("jonsmi"))
.thenReturn(getFakeCustomer());
//Act
Customer customer = customerService
.findCustomerByUsername("jonsmi");
//Assert
assertThat(customer.getId(), is(101L));
assertThat(customer.getUsername(), is("jonsmi"));
}
</div>
</pre>
</p>
<i>Tip 3: Put emphasize on "what" over "how"</i>
<p>
Try to give importance on "what" over "how" even for your test code. Move out the unnecessary implementation details from your test code. This details create noise, which makes harder for your reader to understand what is important in your test. Also try to use <b>Hamcrest</b> matcher utilities such as assertThat, is, anything, notNullValue, hasItem etc. JUnit currently ships with Hamcrest. These utilities help you express your intent clearly and reduce verbosity from your tests. For instance, instead of doing this:
<pre>
<div style="width: 100%; background-color: black; color: white; font-size:0.9em; font-family: "Courier New";">
assertTrue(activities.contains("PLAY"));
assertTrue(activities.contains("READING"));
assertTrue(activities.contains("WRITING"));
</div>
</pre>
You can do this:
<pre>
<div style="width: 100%; background-color: black; color: white; font-size:0.9em; font-family: "Courier New";">
assertThat(activities, hasItems("PLAY", "READING", "WRITING"));
</div>
</pre>
</p>
<i>Tip 4: Extract common features into methods that can be shared</i>
<p>
Many times we write the same thing again and again in our test methods. Remember DRY (Don't Repeat Yourself) principle. Extract common or nonessential features into private helpers and setup methods. But be careful not to make your tests so abstract that future readers do not understand what tests do.
</p>
<i>Tip 5: A test should only check one thing and check it well</i>
<p>
When you put multiple tests in a single test method, you are going to confuse others. If you have a big test method, split it into smaller test methods. Each test should focus on single fixture. By doing this, you will improve readability significantly. Another additional benefit you will get, when a test fails you need to look into smaller portion of code to find the reason. So it improves maintainability as well.
</p>
<i>Tip 6: Try to avoid magic numbers</i>
<p>
I think every programmer agrees that magic numbers are bad and should be avoided. Replace them with constants or variables that give them desired meaning, making code easier to read.
</p>
<i>Tip 7: Simplify your setup method</i>
<p>
Do not dump everything in your setup method (annotated with @Before or @BeforeClass). By doing this, you are making setup method over complicated. It is also an indication of design problem that forces the test to do so much work to put an object under test. Extract all the nonessential details from the setup into private methods. Give appropriate and descriptive names to variables and methods used in setup method.
</p>
<br/>
<i>References: </i>
<ol>
<li>Growing Object-Oriented Software, Guided By Tests - Steve Freeman, Nat Pryce</li>
<li>Effective Unit Testing - Lasse Koskela</li>
<li><a href="http://code.google.com/p/hamcrest/wiki/Tutorial">Hamcrest</a></li>
<li><a href="http://junit.org/">JUnit</a></li>
<li><a href="http://testng.org">TestNG</a></li>
</ol>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-14257125044605797852013-03-31T13:37:00.000-07:002013-03-31T13:37:04.433-07:00Things to remember when REST-ing<div dir="ltr" style="text-align: left;font-family: Georgia, "Times New Roman", serif; font-size: large;" trbidi="on">
<p>
There are many good blogs and books written on designing and developing RESTful system. I have read some of them. In this post I am sharing some notes that I have taken during that time. Please let me know if I miss anything important (as it is a huge subject). I will update this post accordingly.
</p>
<b><span style="color: #990000;">Resource Identification</span></b>
<p>
Every important resource in a RESTful system must have an identifier. Resource identification is very important step in developing RESTful system. We can use URI to identify a resource. A URI uniquely identifies a resource. A resource's URI distinguishes it from any other resources. URI can identify a single resource or a collection of resources. For example:
</p>
<small>
<i>http://myschool.com/courses/123 [identify course no 123]</i><br />
<i>http://mystore.com/orders/2013/03 [identify all orders of Mar, 2013]</i>
</small>
<p>
A resource can have more than one URI, i.e. a resource can be identified in more than one way, but a URI always identifies one resource. Try to use simple URI. Simple URI is always good, no matter whether the resource will be comprehended by human or processed by machine and it is easy to remember.
</p>
<b><span style="color: #990000;">Resource Representation</span></b>
<p>
Support one or more representations of a resource. What is representation?
</p>
<small>
<i>A representation is a transformation or a view of a resource's state at an instant in time.</i>
</small>
<p>
Each resource's identifier (for example: URI) is associated with one or more representations. We can use XHTML, Atom, XML, JSON, plain text, CSV, MP3, or JPEG to achieve this. These are called transferable or representation formats. In web, different systems exchange representations. They do not access the underlying resource directly. URIs relate, connect, and associate representations with their resources on the web.
</p>
<p>
Try not to give any indication to the consumers to terminate URIs with .json or .xml to get a resource representation in preferred format, rather use <b><i>content negotiation</i></b>. Consumers can use <b><i>content negotiation</i></b> to negotiate for specific representation formats from a service. They will use <b>HTTP Accept request header</b> with a list of media types they're prepared to process. But careful, service does not have to oblige the consumer's request. Service may send resource representation in xml even though consumer has requested in json format. So check the content type in response.
</p>
<b><span style="color: #990000;">Utilize Link</span></b>
<p>
Utilize links to drive application state. It is the core of <b>HATEOAS (<i>Hypermedia as the engine of application state</i>)</b>. In a hypermedia system, application states are communicated through representations of uniquely identifiable resources. The client submits an initial request to the entry point of the service. The service handles the request and responds with a resource representation populated with links. The client chooses one of these links to transition to the next step in the interaction. The client progresses toward its goal by making several such interactions. In this process, application's state changes. So we can say, the change of application's state depends on the service, client, exchange of hypermedia-enabled resource representations, and the advertisement and selections of links. Link approach is beautiful because links help us to point to a resource provided by a different application or may be by a different company.
</p>
<b><span style="color: #990000;">Resource state is not same as Application state</span></b>
<p>
Roy Fielding mentioned this (<a href="http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven#comment-744" target="_blank">in comment section</a>) in one of his blog:<br/><br/>
<small>
<i>
Don't confuse application state (the state of the user's application of computing to a given task) with resource state (the state of the world as exposed by a given service). They are not the same thing.
</i>
</small><br/><br/>
Resource state and application state are two different things. They should not be confused. When the service and consumer interact, they exchange representations of <b><i>resource state</i></b>, not <b><i>application state</i></b>. Application state is defined by a representation that is handed to a consumer by the service. When a consumer makes a request, it gets a small subset of the overall server state and certain transitions (represented as link) to other application states that are offered by the service. Please see another comment by Roy <a href="http://lists.w3.org/Archives/Public/www-tag/2010Oct/0100.html" target="_blank">here</a> in this regard.
</p>
<b><span style="color: #990000;">Do not ignore or misuse HTTP status codes</span></b>
<p>
When a client makes a request to the server, server returns an HTTP status code in response to the request. This status code is important because it provides information about the status of the request. There are five categories of these status codes: 1XX range for informational, 2XX range for success, 3XX range for redirection, 4XX range for client error and 5XX range for server error. It is not good to mix them up because they are helpful to deal with different scenarios. For example: it is not good to send 200 status code and an error message in the response body in case of an error happens. If these codes are used properly, they increase re-usability, better interoperability, and loose coupling.
</p>
<b><span style="color: #990000;">Use caching</span></b>
<p>
Caching helps us to increase the scalability of a RESTful system by storing copies of frequently accessed data in several places along the request-response path. There are benefits of using caching. It reduces bandwidth, latency, load on servers and helps to hide network failure. Using HTTP headers, an origin server indicates whether a response can be cached, and if so, by whom, and how long. Caches along the response path can take a copy of a response (provided that caching metadata allows it). The caches can then use these copies to satisfy subsequent requests.
</p>
<p>
There are two main HTTP response headers that can be used to control the caching behaviour:
</p>
<i>
<b>Expires:</b> The Expires HTTP header specifies an absolute expiry time for a cached representation. After that time, a cached representation is considered stale and must be revalidated with the origin server. A service can indicate that a representation has already expired by including an Expires value equal to Date header or a value to 0. To indicate that a representation never expires, a service can include a time up to one year in the future.
</i><br /><br />
<i>
<b>Cache-Control:</b> The Cache-Control header can be used in both requests and responses to control the caching behaviour of the response. The header value comprises one or more comma-separated directives, that are used to determine whether a response is cacheable, by whom, and for how long.
</i>
<p>
Cacheable responses (GET/POST) should also include a <i>validator</i> either an ETag or a Last-Modified header:
</p>
<i>
<b>ETag:</b> ETag is useful to validate the freshness of cached representation of a resource. An ETag value is an opaque string token that a server associates with a resource to uniquely identify the state of the resource over its lifetime. When the resource changes, the ETag changes accordingly.
</i><br /><br />
<i>
<b>Last-Modified:</b> Last-Modified header indicates when the associated resource last changed. The Last-Modified value cannot be later than the Date value.
</i>
<p>
If a consumer wants to revalidate a response, it should include a <i>Cache-Control: no-cache</i> directive in its request. This ensures that the conditional request travels all the way to the origin server, rather than being satisfied by a cache.
</p>
<p>
When doing validation, use <b>conditional GET</b>s. A conditional GET only sends and receives just HTTP headers rather than headers and entities bodies. It only exchanges entity bodies when a cached resource representation is out of date. Conditional GETs are useful only when the client making the request has previously fetched and held a copy of a resource representation along with <i>ETag</i> or <i>Last-Modified</i> value. Consumer or cache uses a previously received <i>ETag</i> value with an <i><b>If-None-Match</b></i> header, or a previously supplied <i>Last-Modified</i> value with an <i><b>If-Modified-Since</b></i> header. If the resource hasn't changed (that means its <i>ETag</i> or <i>Last-Modified</i> value is the same as the one supplied by the consumer), the service replies with <i><b>304 Not Modified</b> (plus any ETag or Location headers)</i>. If the resource has changed, the service sends back a full representation with a <i><b>200 OK</b></i> status code.
</p>
<p>
Consumers can also influence cache behaviour by sending Cache-Control directives in requests: <i><b>max-age</b>, <b>max-stale</b>, <b>min-fresh</b>, <b>only-if-cached</b>, <b>no-cache</b>, <b>no-store</b></i>.
</p>
<b><span style="color: #990000;">Other things to consider:</span></b>
<ul style="list-style-type:square">
<li>Use a hypermedia-aware media type such as HTML, XHTML, SVG, Atom</li>
<li>Do not tunnel updates through GET</li>
<li>Use self-descriptive message</li>
<li>Try to ignore chattiness with many round trips</li>
<li>Try to accept and support compression as defined by the HTTP 1.1 specification</li>
<li>Try to render relative links where possible</li>
<li>Try to implement paged representation where applicable</li>
<li>Do not misuse cookies</li>
<li>Think about security</li>
</ul>
<p>
<b>References:</b>
<ul style="list-style-type:square">
<li><a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm">Fielding, Roy Thomas: Architectural Styles and the Design of Network-based Software Architectures. Doctoral dissertation, University of California, Irvine, 2000</a></li>
<li>REST in Practice - Jim Webber, Savas Parastatidis & Ian Robinson</li>
<li><a href="http://www.infoq.com/articles/rest-introduction">Rest Introduction</a> by Stefan Tilkov</li>
<li><a href="http://vimeo.com/20781278#at=0">Hypermedia APIs - Jon Moore</a></li>
<li><a href="http://www.infoq.com/articles/rest-anti-patterns">Rest- Anti-patterns</a> by Stefan Tilkov</li>
<li><a href="http://www.w3.org/Provider/Style/URI.html.en">Cool URIs don't change</a></li>
</ul>
</p>
</div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0tag:blogger.com,1999:blog-1558947212053086437.post-37370423500281862962012-09-15T16:37:00.000-07:002012-09-15T16:37:18.148-07:00Entities, Value Objects and Services<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="font-family: "Trebuchet MS",sans-serif; font-size: 11pt; text-align: left;" trbidi="on">
In chapter five <i>A Model Expressed in Software</i> of <a href="http://domaindrivendesign.org/books/evans_2003">Domain-Driven Design</a>, Eric Evans wrote about ENTITIES, VALUE OBJECTS and SERVICES. These are three important patterns of DDD. They help us to capture important concepts of the domain. By classifying the objects in this way we can make objects less ambiguous. In this post I write down my understanding about this topic.
<br />
<br />
<b><i>
So what is an ENTITY?
</i></b>
<br />
<br />
<div style="border: 2px solid maroon; letter-spacing: 4px;">
<center>
<span style="padding-bottom: 10px; padding-top: 10px;">ENTITY = IDENTITY + CONTINUITY</span>
</center>
</div>
<br />
ENTITY is an object that has distinct identity and also has continuity. This type of object is not fundamentally defined by its attributes. For example: two customers can have same name, age or even living in same address but they must have unique identifier (say, customer number) within the system.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1DL-HIujQFiXC7MK6VQ8XTatvUofeGTZ4B3XguFiso4qbTeL9sCmbTy1m2Sq5cqT3JlObGnGEzHiZMn6fu9tv6g1YVt-ubHXBwPyMf7gpAMsSNaY741HH8HtozTjj6QJLhW5ZR-MemmA/s1600/entity1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1DL-HIujQFiXC7MK6VQ8XTatvUofeGTZ4B3XguFiso4qbTeL9sCmbTy1m2Sq5cqT3JlObGnGEzHiZMn6fu9tv6g1YVt-ubHXBwPyMf7gpAMsSNaY741HH8HtozTjj6QJLhW5ZR-MemmA/s400/entity1.png" width="400" /></a></div>
<br />
<br />
ENTITY has a life cycle that can change its form and content over the time but a thread of continuity must be maintained. Here identity plays an important role because ENTITY can be tracked effectively with the help of it.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_MIwfXS2v3Xjw35v3sfO9O9ot4efZaraxFgY1pSCG7PPFRsgSJ9DzAO7Aq32gjbR7jboPiS9AB190mLmsDoNOnTsGp9Fza2G8lAwWlPHMnvlOeXMK1vTDZ04h1sXGXZHIkolb-4Sruj4/s1600/entity2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="142" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_MIwfXS2v3Xjw35v3sfO9O9ot4efZaraxFgY1pSCG7PPFRsgSJ9DzAO7Aq32gjbR7jboPiS9AB190mLmsDoNOnTsGp9Fza2G8lAwWlPHMnvlOeXMK1vTDZ04h1sXGXZHIkolb-4Sruj4/s400/entity2.png" width="400" /></a></div>
<br />
<br />
Care must be taken when generating identity for an ENTITY. Identity must be guaranteed to be unique within the system no matter how the system is developed, even whether it is distributed or not. Once generated, it must not be changed. Sometimes single data attribute (guaranteed to be unique) can be identity of an entity. For example: order number, transaction number, account number, customer number etc. Sometimes you need a combination of data attributes to define an identity. For example: daily newspapers might be identified by the name of the newspaper, the city, and the date of publication.
<br />
<br />
<i><b>
What about VALUE OBJECTS?
</b></i>
<br />
<br />
VALUE OBJECT does not have a conceptual identity but it represents a descriptive aspect of the domain.For example: A customer may be modelled as an ENTITY with an identity, but his phone number is a VALUE OBJECT.
<br />
<br />
VALUE OBJECTS help us to design and code in a better way:<br />
<ul style="text-align: left;">
<li>They make implicit concepts explicit.</li>
<li>They help to write a clear service api.</li>
<li>They take the responsibility of data validation and error handling.</li>
<li>They help us to write a well testable and maintainable code.</li>
</ul>
Take the example of a phone number. We can declare it's type as string in customer class.
<br />
<pre><code>
public class customer {
......
private string phoneNumber;
........
}
</code>
</pre>
But if you think carefully, declaring phoneNumber as a string does not say much about it. Here the phoneNumber is implicit. It may introduce bugs, make you to write awkward and duplicate code. When you make it explicit, things become more clear:
<br />
<pre><code>
public class customer {
......
private PhoneNumber phoneNumber;
........
}
</code>
</pre>
It also opens option to add different behaviors to PhoneNumber. You can add data validation and error handing to this object. Here is the sample code of PhoneNumber [2]:
<br />
<pre><code>
public class PhoneNumber {
private final String number;
public PhoneNumber(String number) {
if(!isValid(number))
throw ...
this.number = number;
}
public String getNumber() {
return number;
}
static public boolean isValid(String number) {
return number.matches("[0-9]*");
}
public String getAreaCode() {
String prefix = null;
for (int i=0; i< number.length(); i++) {
String begin = number.subString(0,i);
if(isAreaCode(begin)) {
prefix = begin;
break;
}
return prefix;
}
private boolean isAreaCode(String prefix) { ... }
}
</code></pre>
</div>
<div dir="ltr" style="font-family: "Trebuchet MS",sans-serif; font-size: 11pt; text-align: left;" trbidi="on">
If you noticed carefully, you will find that we have put computational complexity in this object rather than putting it in service layer. So the service layer has less burden. Code duplication is reduced. Less code means less bug. Now you can write a set of junit testcases for this object.
<br />
<br />
Lets see how VALUE OBJECTS help us to write clear service api. For example: we have a api that takes name, age, address and phone number to add a customer:
<br />
<br />
<code>
void addCustomer(String, Int, String, String);
</code>
<br />
<br />
Is the above api clear to you? You can see String and Int all over, nothing meaningful. How about writing this way:
<br />
<br />
<code>
void addCustomer(Name, Age, Address, PhoneNumber);
</code>
<br />
<br />
Now the api is readable. Even a non technical person can understand the above method signature.One thing we need to remember - VALUE OBJECTS should be immutable. For example:
<br />
<br />
<code>
Money money1 = new Money("EUR", 30); <br />
Money money2 = new Money("EUR", 40); <br />
Money money3 = money1.add(money2); <br />
</code>
<br />
When you add money2 to money1, you are not altering money1, instead returning a new Money object (assigned to money3) which represents the two amounts of Money added together. Ensuring immutability of value object is important if you want to share it safely. It cannot be changed except by full replacement.
<br />
<br />
<i><b>
Services
</b></i>
<br />
<br />
A SERVICE is a standalone domain operation that you cannot fit in an ENTITY or VALUE OBJECT. It is defined purely in terms of what it can do for a client.
<br />
<br />
Evans mentioned in his book that a good SERVICE should have three characteristics:
<br />
<ul>
<li>The operation relates to a domain concept that is not a natural part of an ENTITY or VALUE OBJECT</li>
<li>The interface is defined in terms of other elements of the domain model</li>
<li>The operation is stateless</li>
</ul>
<br />
SERVICES can be partitioned based on layer:
<br />
<ul>
<li>
APPLICATION SERVICES sit above the domain services, handle cross cutting concern such as transaction, security. They also talk to the presentation layer to get the input or send the output back.
</li>
<li>
DOMAIN SERVICES deal with business logic that cannot live in an ENTITY. For example, transferring fund between two accounts.
</li>
<li>
INFRASTRUCTURE SERVICES are those service that are more technical in nature, for example sending out an email or SMS text message.
</li>
</ul>
<br />
Operation names in SERVICES should come from the UBIQUITOUS LANGUAGE. Parameters and results of these operations should be domain objects to make them explicit. SERVICES should be used carefully, don't take away all the behaviors from ENTITIES and VALUE OBJECTS and put them in SERVICES.
<br /><br />
References:
<br /><br />
1. <a href="http://domaindrivendesign.org/books/evans_2003" target="_blank">Domain-Driven Design by Eric Evans</a> <br />
2. <a href="http://www.infoq.com/presentations/Value-Objects-Dan-Bergh-Johnsson" target="_blank">Power Use of Value Objects in DDD by Dan Bergh Johnsson</a><br />
3. <a href="http://www.martinfowler.com/bliki/EvansClassification.html" target="_blank">EvansClassification by Martin Fowler</a>
</div></div>
Anonymoushttp://www.blogger.com/profile/02496274832725249460noreply@blogger.com0