Wednesday, September 20, 2017

CSS Selectors - Level 1 & 2

As my interest in CSS grew, I have come across more CSS selectors that I should have used in my projects, but haven't due to ignorance. I wrote this blog post more as a way to freeze these selectors in my brain so that I remember to use them next time. To keep things simple, I make two assumptions for this post:
  1. We're trying to use CSS for HTML, and
  2. We're only using CSS selectors until CSS Selectors level 3. Level 4 is not yet a W3C Recommendation, so when that happens, I shall have a new blog post.
Also, this blog post talks only about selectors in CSS 1 & 2. Selectors in CSS 3 will be looked at in another blog post.

Level 1 CSS selectors

When CSS was first announced, it was called Level 1 CSS. It had only a few selectors. (Later "versions" of CSS added more selectors, as demands from web designers grew.) Because CSS 1 is quite old, it's also extremely well-known among developers & designers. Thus, for the most popular selectors in CSS 1, I'll skim over them as most developers/designers know these by now.

Type selector

This selector only contains the name of an HTML tag. All elements on the page of that type will be applied the associated style.
An example: span {color:red}

This will make all text inside <span> tags red in colour.

ID selector

The ID selector contains the name of an HTML element's ID attribute. Only the element that has that particular value for the ID attribute will have the style applied to it. The ID selector is constructed by having the hash (#) symbol followed by the HTML element's ID without any space.

Eg: #content {color:red} will render the HTML element with ID 'content' in red coloured text.

Class selector

The class selector defines a style that can be applied to a class (aka. category) of HTML elements. What determines if an element belongs to the class or not is the class attribute of the HTML element. If the class attribute's value matches the class selector, then the style is applied.

As an example, consider two span tags,
<span id="span1">Content</span>
<span id="span2" class="important_content">Warning message</span>

and a style definition,
.important_content {color: red;}

In the above case, only the <span> tag with id 'span2' will have the content in red colour. This is because it is the only <span> tag that belongs to the class of elements represented by 'important_content'. Other <span> tags in the same document do not belong to that class of elements and hence will not have the content in red colour.

Because this selector is used to categorize elements into a single grouping, you can have multiple elements having the same value for the class attribute - all those elements will be grouped into a single category represented by that class selector, and the style will be applied for all those elements.

Descendant selector

The descendant selector is used to select any element that is a descendant of any other element.

The syntax is ancestor targeted_element. The style is applied to targeted_element.

Here is an example. Consider you have the following HTML:
    <span>Span tag inside a div tag</span>
            <span>This is the first list item.</span>
        <li> This is the second list item.</li>
with this CSS applied:
div span {color:blue;}
The output we get is:

What's happening here is that all <span> tags at all levels of the <div>'s subtree get the style applied.


CSS 1 had 3 pseudo-classes: :link, :visited, and :active. These work only on links and therefore, defining them on other elements has no effect. The :link pseudo-class is to specify how links will normally be shown, the :visited pseudo-class is to specify how those links on the page which point to URLs you have already visited are to be shown, and the :active pseudo-class is to specify how a link should appear when it is currently being selected.


CSS 1 had 2 pseudo-elements: :first-letter & :first-line.
Note that in CSS 3, the single colon (:) at the beginning of each pseudo-element has been replaced by the double colon (::). Thus, if you're trying out these examples on a modern browser, remember to use double colons.For convenience, my example code already has them replaced.

The :first-letter pseudo-element

The :first-letter pseudo-element is used to specify a styling only for the first-letter of the element's content. Sometimes, it is to indicate that a new paragraph has started; other times it is for publishing scenarios such as drop caps.
As an example, let us assume that you have the following HTML content:
<div id=first_div>
    This is a single line of text.
<div id=second_div>76 is a number.</div>
<div id=third_div>
    "Let's go", he said, with determination.
and the following CSS is applied to the HTML:
#first_div::first-letter {font-size: 20pt;}
#second_div::first-letter {font-size: 20pt;}
#third_div::first-letter {font-size: 20pt;}
The result is this:
What's happening here is that the first letter of each <div> tag is being applied the style to increase its font size. We are able to do this without wrapping the first letter in any special tag (eg: a <span> tag). If not for the :first-letter pseudo-element, wrapping the first letter in a special tag would be the only way to achieve the same effect.

This is also the reason why :first-letter is a pseudo-element. The CSS selector applies the style as if there was a tag wrapped around the first letter with that special styling.

What is also noticeable is that if the content of the element starts with any quotes, then the :first-letter style applies to the quotes and the letter following it.

The :first-line pseudo-element

The :first-line pseudo-element is used to specify a styling only for the first line of the element's content. This is often required in publishing scenarios where publishers may prefer to highlight the first line of a new paragraph to make it easier to identify that a new paragraph is starting here.
As an example, let us assume that you have the following HTML content:
    They walked into the forest, not entirely unmindful of the animals that lurked there. But to them, much more than the dangers the animals represented, was the fear of losing out - of not achieving their goal.
and the following CSS is applied to the HTML:
div::first-line {text-transform:uppercase;}
The result is this:
What happened here is that only the first line of the <div> tag is in upper-case. Notice that we mean first line & not first sentence, i.e., CSS does not look for a full-stop (aka period) to indicate the end of the first sentence. It only applies the style until the content in the <div> tag wraps to the next line, at which point the style application stops.

If you resize your browser window to make the line longer or shorter, more (or less) content will be made upper-case.

Level 2 CSS selectors

Let us now look at what was newly introduced in CSS 2.

Universal selector

The universal selector represents all elements in an HTML document. The universal selector can only be written using the asterisk (*) symbol.
An example: *{color:red}
This will make all text red in colour.

Attribute selectors

CSS 2 introduces the concept of attribute selectors, where you can specify that your styles must match only those elements that have certain attributes or characteristics of attributes. CSS 2 introduces 4 attribute selectors (CSS 3 adds 3 more). Let's take a look at the 4 attribute selectors below:

Presence selector

This selector matches those elements that have the attribute specified. The attribute only has to be present in the element - it can have any value or it can have no value.

The syntax of the selector is element[attribute].

As before, here's an example. Let's assume you have the following HTML code:
<div id=first_para>
    <span>This is the first paragraph.</span>
<div id=second_para>
    <span>And this is the second.</span>
and the following CSS:
div[id] {color:blue;}
The result is this:
What happened here is that the blue colour styling was applied to both <div> tags, even though they have different HTML IDs. That is because the selector only checks for the presence of the attribute - it doesn't check for the presence of a value.

Where would such selectors be used? In HTML, there are places where an attribute just has to exist for some behaviour to be triggered - the attribute doesn't have values as per the HTML spec. An example is the selected attribute for checkboxes & list boxes.

Value selector

This selector matches those elements that have the attribute specified, and that attribute's value is the value specified in the selector. Note that this selector is case-sensitive - it will match the element only if the value of the attribute is the same case as the one specified in the selector.

The syntax of the selector is element[attribute=value].

Let's assume you have the following HTML:
<div id=first_para>
    <span>This is the first paragraph.</span>
<div id=First_para>
    <span>This is the second paragraph, but has the same ID, different only in case.</span>
and this CSS:
div[id=first_para] {color:blue;}
The output we get is this:
What happened here is that only the <div> tag with the ID attribute having first_para as its value got the style applied. Other <div> tags did not get that style because they didn't have the exact value with the exact case as in the selector.

Attribute sub-string selectors

There are 2 attribute selectors available in CSS 2.

The first selector matches those elements that have the attribute specified, and that attribute's value is the value specified in the selector. However, this selector will also match if the attribute has multiple values separated by spaces, and one of those values is the value specified in the selector.

The syntax of the selector is element[attribute~=value].

Here's an example. Assume that you have the following HTML:
<div id="content header">
    <span>CSS 2 is now a Recommendation</span>
<div id="content gist">
    <span>The standards body has approved the various new features in CSS 2.</span>
<div id="content body">
    <span>Detailed info on CSS 2 is found here.</span>
and the following CSS
div[id~=header] {text-transform:uppercase;}
div[id~=gist] {font-style: italic;}
the output will be:
The first property rule matched the first <div> tag because the id attribute has the word, header, in it. Similarly, the second property rule matched the second <div> tag because the id attribute has the word, gist, in it.
Again, as in previous selectors, the selector matches only if the attribute matches case-sensitively.

The second attribute sub-string selector matches those elements that have the attribute specified, and that attribute's value is the value specified in the selector. However, this selector will also match if the attribute has multiple values separated by hyphens, and that hyphenated set of values starts with the value specified in the selector.

The syntax of the selector is element[attribute|=value].

Let's look at an example. Assume the following HTML:
<div id="header-content">
    <span>CSS 2 is now a Recommendation</span>
<div id="gist-content">
    <span>The standards body has approved the various new features in CSS 2.</span>
<div id="body-content">
    <span>Detailed info on CSS 2 is found here.</span>
and the CSS applied is:
div[id|=header] {text-transform:uppercase;}
div[id|=gist] {font-style: italic;}
then the output we get is:
As before, the first property rule matched the first <div> tag, and the second property rule matched the second <div> tag.
Again, the attribute value is matched case-sensitively.

Child selectors

CSS 2 introduces the concept of child selectors, where you can specify which children of an element should your selector match. Note that this is different from the descendant selectors we saw in the CSS 1 section - the child selectors in CSS 2 only select children, while the descendant selector can go deeper in the tree.

CSS 2 introduces 2 child selectors, while CSS 3 introduces a lot more. Let's take a look at the 2 CSS 2 child selectors below:

:first-child pseudo-class

The :first-child pseudo-class matches those elements which are the first child of a parent element. In the pseudo-class, you only specify the type of the child element - no information about the parent needs to be provided. The pseudo-class searches for all elements of that type, and then runs through each result to check if it is a child of some other element, and if yes, checks further to determine if it is the first child. The result of this check gives a list of elements for which the style is applied.

The syntax is child-element:first-child.

Consider this example HTML:
<span>under the body tag</span><div>
    <span>First child under the div tag</span>
    <span>Second child under the div tag</span>
and this CSS that is applied on that HTML:
span:first-child {color:blue;}
the output we get is:
What happened here is that all tags inside the <body> tag are children of the <body> tag. Thus, the <span> tag immediately following the <body> tag is treated as a child, and it is the first child of the <body> tag. Thus the styling applied to it.

For the <span> tags inside the <div> tag, the first <span> tag is the first child of the <div> & hence got the styling applied, while the second <span> did not.

Again, the advantage here is that we got this styling applied without the need for any special elements to wrap around the first child. Also, in cases of DHTML, the browser will take care of applying the style in case children are added/removed.

Generic child selector

Another child selector available is the more generic version of the :first-child pseudo-class. It is used to match any element that is a child of any other element. This selector allows you to specify even the parent element of the targeted child element.

The syntax is parent > child.

Assume the following HTML:
    <span>first span</span>
    <span>second span</span>
            <span>This is the first list item.</span>
        <li> This is the second list item.</li>
and the following CSS:
div>span {color:blue;}
the output we get is:
The first two <span> tags under the <div> tag are matched. This is because the selector matches all children, not just the first child.

The <span> tags inside the <ul> tags are not matched because the selector only matches children of the parent, not grandchildren & other descendants.

Adjacent sibling selectors

CSS 2 also introduces the concept of sibling element selectors. The sibling selector is used to select those elements which have a specified sibling element that appears before them. Note that the specified sibling must appear before & not after.

The syntax for this selector is sibling + targeted_element. The style will be applied to targeted_element.

Consider this HTML:
            <span>This is the first list item.</span>
<span>First span under div</span>
<span>Second span under div</span>
and this CSS:
div+span {color:blue;}
then the output is:
What's happening here is that the first <span> tag has the style applied because its siblings is the <div> tags specified in the selector, and that <div> tag appears before the <span> tag.

Also, the <span> tags inside the <ul> tag are not selected by the selector since they do not have any siblings. Similar, the final <span> tag is also not selected since it's immediate preceding sibling is another <span> tag.

New pseudo-elements

CSS 2 introduces 2 new pseudo-elements, ::before & ::after. These pseudo-elements are used to render content either before or after the element. What's the big deal, you may ask? The deal is this: Sometimes you have a need to render repeating content, and the repeating content may need to appear either before or after a set of elements. An example of such a usecase would be adding red asterisks after every label in a form to indicate required fields. Instead of writing it into the HTML, you can style it as a CSS rule.

Assume we have the following form:
    <label for=id_no class=required>Identification Number</label>
    <input type=text name=id_no />
    <label for=name>Name</label>
    <input type=text name=name />
    <input type=submit />
with this CSS:
    content: "*";
    color: red;
It results in this output:

Examples of other repeating content are page numbers, chapter numbers, etc.
Note that in CSS 3, the single colon (:) at the beginning of each pseudo-element has been replaced by the double colon (::). Thus, if you're trying out these examples on a modern browser, remember to use double colons.For convenience, my example code already has them replaced.

New pseudo-classes

The :lang pseudo-class

CSS 2 introduces a new pseudo-element, :lang. This pseudo-element is used to set styles based on the language of the document. The language may be set by multiple mechanisms depending upon the markup language, but in HTML, it is usually set using the lang attribute on the <html> tag.

Consider the HTML:
<html lang="fr">
        <link href="lang.css" rel=stylesheet />
        <div>some content</div>
and the content in lang.css:
div:lang(fr) {color:blue;}
then the output is that the text is rendered in blue colour.

If the language was changed to English("en") in the HTML, but not in the CSS file, then the text will be rendered in the browser's default colour.

The :hover & :focus pseudo-classes

The :hover pseudo-class is used to apply styling when the mouse pointer is currently over the element. The :focus pseudo-class is used to apply styling when the element receives focus, which can be due to mouse-click or a keyboard event.

As an example, consider the following HTML:
<input type=submit />
with the following CSS applied:
input:hover {cursor: pointer;}
input:focus {color: blue; border: 2px black solid;}
The following output is obtained when we press the Tab key to move focus to the button:

 You will also notice that if you move the mouse over the button, the mouse pointer will become the hand pointer instead of the typical arrow pointer.

Friday, October 07, 2016

Certifications - "Do they benefit me?" is the more important question

There's a lot of controversy regarding certifications.
Some people think certifications have no value. Much of this seems to stem from the possibility that a person having a certification may not actually have imbibed the knowledge for him to be effective and for his employers to reap the benefits. Some others think that like all exams, it's easy to cheat and get the certification. Some others think that the fact that some certifications have limited validity means their value to employers expires within a certain period of time. That is, unless the person demonstrates a constantly-learning attitude even without certifications coming into play, having a certification won’t help.
In my opinion, while all of these reasons may be correct, one shouldn’t ignore certifications. Here are some reasons why:
  • To folks outside the industry, certifications provide a proof that you have skills and those skills have been validated by a standards authority. To explain that sentence better, I quote here something I learnt on the Internet: anyone can drive vehicles without a license, but when you want to hire a driver, you’ll hire one with a license. When looking for a driver, you’ll avoid looking for a driver without a license because you do not want to add to the problems you already have on your plate (like getting into accidents). The only way to achieve this is to look for someone who has had his skills validated by a standards authority (here, the government licensing authority).
  • Another reason for certifications is that they are a great way to have a deep understanding of the technology involved. If you’re like most developers, then you probably have worked in a lot of technologies over the years. After doing so for a few years, some developers decide that it’s better to focus on a technology and become an expert at it, rather than jump from one technology to the other and skim only the basics of each technology. Once such a decision is made, the best way to achieve it within a reasonable timeframe is to get a certification in the technology. To get the certification, one will have to look out for some courses/books related to the certification. These courses/books teach basic & advanced concepts and also have mock exams where you can test your skills before taking the actual exam. Taking these mock exams (with all honesty & seriousness) helps in building your understanding of the technology, leading to better career opportunities.
    A lot of folks will say, “This isn’t really different from the usual advice that developers must read books". True, but taking a certification really makes you understand a technology, since you have to pass the exam (or atleast the mock tests), rather than just reading a book & potentially forgetting the concepts later.
  • If you’re totally new to the software field, having a certification helps to get a foot into the door. It demonstrates that you took extra effort to understand something, and that you have some basic knowledge. Keep in mind that that is all a certification can do - if you can’t code even though you have a certification, then you’ll not have a chance.
I want to expand on that last point. As Jeff Atwood says, software is a field where you can expect to work on multiple technologies & frameworks inside that technology. You’re often expected to demonstrate that you can do great work using a technology you may only a fair knowledge of. This means there’s going to be a frustrating period in which you ramp up on the technology only to be moved to a new technology later. It also means you’re going to get co-workers who are new to the technology, and are ramping up really slowly, making you wish they had read up on the basics before joining your team. In both cases, certifications come to your rescue - they guarantee that there is some basic knowledge that you (or your co-worker) have.
So to summarize, do not wish away certifications just because someone said so. Also, do not do a certification just because I said so. Think about the benefits that you get out of the certification. Think about the time invested in the certification and whether it is time that would return more value if invested elsewhere. Some certifications are valuable only for a certain time period; in that case, are you ok with your certification losing value some years down the line, or do you think you can keep updating the certification as the years go by? Think through the pros & cons from your angle (not from mine or someone else's), and take the decision that best fits you. 

Saturday, December 19, 2015

Why a mixed format is not recommended

While pairing with developers, I have often noticed that they have a tendency to periodically do a mixed format.

What is a mixed format?

Now I have no idea whether this is the official term, but here is what I mean when I say, “mixed format”. A mixed format is when a developer, working on some code, comes across some other code that is not formatted as per the project’s conventions. This code could span a few lines, or in worse cases, a whole file. The developer immediately invokes his editor’s format command, and formats the offending lines, or the whole file. With a satisfied smile on his face, the developer moves on to complete whatever work he was originally tasked to do. He then creates a commit that includes:
  • the work he was originally tasked to do, and
  • the formatting that he set right.

What’s wrong here?

Now, from the point of view of clean code and team work, formatting is not wrong. However, I do not recommend crafting a commit that mixes both format changes and logic changes, when the following conditions hold true:
  1. The format changes are not related to the actual lines that the logic change encompasses
  2. The format changes are more than logic changes
Why? Consider what happens when the developer goes ahead and checks in his code to the VCS. Other developers reviewing his commit immediately notice that the commit’s code changes are too many - this results in an impression forming in the reviewer's mind which can range between “Wow, this is a large commit. I need to go line by line” to a feeling of just giving up. With inexperienced or bored developers, it is usually the latter.
Also consider what happens when sometime in the future, a developer realizes that your commit introduced a line that causes a bug. In order to ensure a clean fix, he opens your commit with the intention of understanding what you intended to fix. And he arrives at the same realisation - your code changes are too many. Without any choice, he is forced to go through each line to understand what it does. Imagine his frustration when most lines turn out to be formatting changes, and hidden among the formatting changes is the actual change he’s looking for.
The lesson here is to avoid large formatting changes mixed with logic changes. Prefer to stick to formatting only those lines where your feature/bug also demands a change. If you can’t avoid this, then make two commits - one for the feature/bug changes, the other just for formatting changes.

This is only a recommendation, not a rule

As soon as you read this, please don’t fire up the comments editor or your blog editor to write a comment/blog about why I am wrong. I understand this is basically a Considered Harmful essay, and I know that Considered Harmful essays are considered harmful. With that in mind, I’ll only say that the above is a recommendation, not a rule. When making such a commit, please do think about how a future you would feel if you came across such a commit, and how you’d react. 

Sunday, December 21, 2014

Git: What are diffs and hunks?

When I was learning Git for the first time many years ago, one of the features that made me go, "Wow!! That's something I have really wanted all these years!" was the ability to choose which changes to commit among all the changes in a given file. I hadn’t seen this in the other version control systems I’d used, which were CVS and SVN.
Here’s an example of what I am trying to illustrate. Suppose I have a file named with the following contents,
class Employee {
     private String firstName;
     private String lastName;

     Employee(String firstName, String lastName) {
          this.firstName = firstName;
          this.lastName = lastName;

     public void equals(Employee e) {
          if !(e instanceof Employee)
               return false;
          return e.firstName.equals(this.firstName) && e.lastName.equals(this.lastName);
Ignore the fact that there's no hashCode() implementation, please!!
You decide to add more functionality to, namely, a grade instance variable and a toString() method that prints out who the employee is and what he does. now looks like this:

class Employee {

     private String firstName;
     private String lastName;
     private String grade;

     Employee(String firstName, String lastName, String grade) {
          this.firstName = firstName;
          this.lastName = lastName;
          this.grade = grade;

     public void equals(Employee e) {
          if !(e instanceof Employee)
               return false;
          return e.firstName.equals(this.firstName) && e.lastName.equals(this.lastName);

     public void toString() {
          return “I am “ + this.firstName + “ “ + this.lastName + “, working as “ + this.grade;
Ignore the fact that grade is not part of equals(), please!!
When you do a git diff on, this is what you get:

When you do a git add at this point, all the newly introduced code will be ready for commit. Let’s say you want to add the toString() function as a separate commit. In other VCSs, that's not simple. You will have to maintain two copies of, with one copy introducing the grade variable, and another copy introducing toString(). This is cumbersome, but in Git, is very easy. You just do
git add -p
which allows you to choose what pieces of code change to commit. For the above example, doing git add -p would give you

At this point, keying in 'y' will add this to the index, after which the next piece of code change is shown.

and so on…
When I learnt this, I thought, "All that’s fine, but what is the word ‘hunk’ doing there in “Stage this hunk?"? What does it mean anyway?”
To know what’s a hunk, you’ll have to know more about the output of the diff command. Note that we are not talking about git diff, but just diff.

Understanding the diff command

diff is the Linux command to generate a report that documents the differences between two files. According to Wikipedia, given two files, a and b, with b being an updated version of a, then diff basically reports what changes should be done on a to make it b.
The report that diff generates can be in 3 forms. They are: a) Edit script, b) Context format, or c) Unified format. With git diff, we get the Unified format.
The unified format, explained in short, goes like this:
The entire output of diff is called ‘diff’. That’s why people often say, “Send me the diff”. They are actually asking for the output of the diff command.
A diff begins with two lines that indicate the two files being compared. The first line begins with ‘---’ and indicates the original file, while the second line begins with ‘+++’ and indicates the newer file. Line additions are preceded with a  ‘+’ symbol, while line deletions are preceded with a ‘-’ symbol. Line modifications are represented as a combination of line deletion and addition.
Now, when a change occurs to a file, the change can be:  a) in only one line, b) in consecutive lines, or c) in lines spread all over the file.
Thus, the receiver of a diff would like to know which line numbers in the original unchanged file were changed. Hence, it is enough if the output of diff includes a special line that indicates the starting line position of the change, as well as the destination line position, followed by the actual changes. The destination line position is included since earlier changes in the same diff could have pushed the original line further down the file.
However, (especially in open-source projects), it is possible that two changes are applied to a file by two separate users at the same line. When integrating these two changes, it is not useful if you only have the line numbers. You also need to provide some context, by which we mean some lines before and after the changed line. This is useful when applying conflicting changes like the one above, as we can use it to determine how the second change should fit in on the first change.
The unified format handles both by providing context around the changed line, and also providing a special line that indicates where in the file, the first line of context starts, and how many lines of context are provided. To indicate that these lines are special lines that are only for the receiver’s understanding and are not part of the diff, the Unified format surrounds such special lines with ‘@@‘ symbols. Such lines are called range information lines. The format of a range information line is:
@@ -<<starting line number of context in original file,number of lines of context from original file>> +<<starting line number of context in modified file,number of lines of context from modified file>> @@

Understanding diff

This should now help us understand the output of git diff that we did on earlier. Let’s take a look at it again:

The first two lines that you see,
diff -- git a/ b/
index b2ea747..cbdaf9e 100644
are generated by Git. Beyond this is the actual diff output. So let's ignore this and move onto the diff.

The first two lines in the diff,
--- a/
+++ b/
are the two files that diff is trying to compare. is prefixed with ‘a/’ and ‘b/’ in the two lines because Git is comparing your copy of with the copy in HEAD. Git tries to represent these two versions of as being in two folders ‘a/’ and ‘b/’, just as a way of differentiating them. In reality, if you had used just diff, you would have provided two files physically present on the filesystem.

The first range information line is:
@@ -1,6 +1,7 @@
In the range information line, the “-1,6” indicates that the original file’s context provided starts from the first line of the file, and 6 lines of context are provided. The “+1,7” indicates that the new file’s context provided starts from the first line of the file, and 7 lines of context are provided. Why 7? Because of the addition of the grade variable, that is only present in the new file.
The second grade information line is:
@@ -12,5 +13,9 @@ class Employee {
In this range information line, the “-12,5” indicates that the original file’s context provided starts from the 12th line of the file, and 5 lines of context are provided. The “+13,9” indicates that the new file’s context provided starts from the 13th line of the file, and 9 lines of context are provided. Why is the starting line position in the new file 13? Because of the addition of the grade variable previously. Why 9 lines of context? Because of the addition of the toString() method in the new context.

So what’s a hunk?

Now that you’ve understood the diff output, it becomes easy to understand hunks. Hunks are simply the term for the combination of a range information line followed by the change information until the next range information line.

Wednesday, July 03, 2013

Restaurants: A novel way to remember orders!!

On a recent trip to the US, we used to go out for lunch with our clients to various places.

One hotel we went to seemed to be pretty popular, and there was usually a crowd during lunch. On this particular day, we sat down and placed our orders. We were a huge group, so our entire order was not easy to remember. But I remembered reading somewhere that waiters were good at remembering orders, and hence I decided to ignore it. "She has noted it down on a notepad anyway, so it shouldn't be a problem for her," I thought.

Our first order arrived, carried by a different waiter from the one who took the order. She came straight to the table, and placed it right in front of the person who had asked for that item.

I was surprised that even though she was different from the one who took our order, she knew which customer had ordered that item. I put it down to the original waiter informing the new one of who had placed that order.

The subsequent orders came and the same thing happened again and again. I was surprised. I looked around at another table, and after some time, noticed the same pattern. Different waiters would serve the same tables, and each waiter knew which customer had ordered what. These same waiters were also serving other tables, and even there, they seemed to know who had ordered what.

"Can they really remember to such an extent?," I wondered. I didn't think I could.


I forgot about the incident and was reminded of it on another day, when we went to a mobile diner of sorts. The mobile diner is just the same as the street food stalls and vans that we see in India.

I placed my order, and the lady gave me my copy of the receipt she wrote the order on. Here it is:

Notice the top row of figures?

There are various shapes with some numbers arranged around them. There is also a circled 'S' symbol.

The shapes are the tables in the restaurant. The numbers around the shapes are the customers that can sit on those tables. Each customer is assigned a number. The circled 'S' symbol is the waiter. Its expansion is probably "server".

When the waiter arrives to take your order, she stands in the position marked by the circled 'S'. She then notes down your order according to the position in which you sit. Thus, if you are the first on her left side, your order is marked against number 1.

This paper is then maintained until the orders are ready, at which point the waiter brings the food to the table along with the paper. Since she knows the name of the food, it's easy to find the customer's position from the paper. She then serves it directly to the customer!! This ensures that any free waiter can serve the food back to the table, and it is not necessary to wait for the original waiter to serve, or to ask the original waiter whom to serve to.

I saw this for the first time in my life in the US, and am not sure whether it exists in India. In most Indian restaurants I have been to, when the waiter comes to serve me food, I am the one indicating to the waiter which food should go to whom.

Thursday, January 24, 2013

JAXB - Generating an <simpleType> with more than 256 <enumeration>s

So this was a strange error that we faced a few weeks ago.

The client we work for has various teams with each exposing their functionality to other teams via web services. So, in effect, a web application can be built, with it talking to various web services to get work done. Our work that day was to make a new web service. This was similar to another web service, with certain differences in inputs and functionality between the two. For various reasons, we decided to create a copy of the first web service's WSDL file and make the changes in inputs to the second WSDL.

While we were doing so, we found that the previous WSDL has a field for accepting the country code, but its data type was marked as string. We felt that this could lead to wrong country codes in the database as people could input any value. Our database also had a master table that stored the country codes. While the web service code did verify the input against the table, we decided to change from string to a simpleType that had restricted elements. This would mean that our clients would never be able to provide invalid values.

Basically, we wanted to change from:

<element name="countryCode" type="string"></element>
<element name="countryCode" type="CountryCode"></element>
with CountryCode type being defined thus:
<simpletype name="CountryCode">
  <restriction base="string">
    <enumeration value="IN"></enumeration>
    <enumeration value="US"></enumeration>

Since our database has 262 country codes, we decided to list all of them, thus having 262 <enumeration> entries in CountryCode. This wasn't a very big work as we initially thought, thanks to copy-paste and IDEA's column selection feature.

We use Apache's cxf-codegen-plugin in our project to generate the Java classes that do much of the XML-Java conversions. cxf-codegen-plugin ties into Maven's generate-sources phase to generate the Java classes. So when we ran mvn generate-sources, we expected an enum type called CountryCode with 262 fields.

In reality, the class was not generated at all.

I immediately had a suspicion over the number of enum fields, because I had never written or seen a Java enum with that many fields. So we trimmed the simpleType to one entry and ran mvn generate-sources, and the result was that the CountryCode class was generated, with one field. When we brought back the entire list, no class was generated. So we commented out the entire list and slowly uncommented a few entries (from the top) one by one to see at which point the error occurred. The Java file was generated fine all along until we reached the final few entries (about 6 or so). At that point, the Java file was not generated.

Again, the thought of some count limitation entered our heads. We were also entertaining the possibility of some character we pasted being of a different encoding or some whitespace character inadvertently getting into our code because of the copy-paste. To rule out the second possibility, we deleted the <enumeration> entries for the 6 country codes and manually keyed them in ourselves. Still, it did not work. To further rule out this possibility, we commented all entries and then slowly uncommented entries from the bottom up. The Java file was generated until we reached the top few entries, at which point it failed.

So we were back to our count hunch.

We were thinking that maybe WSDL had an issue with so many <enumeration>s. We didn't think it would be so, but we decided to check anyway. The WSDL spec did not mention about any restrictions in number for the <enumeration> tag of <simpleType>. So we felt it had to be an issue with either the cxf-codegen-plugin or Java. Googling revealed that Java had a limit for the number of fields in an enum, and that was 65535. Since we were much below this, we ruled out Java as the problem.

So now the only thing left out was the cxf-codegen-plugin. Googling revealed that it internally made use of JAXB. Further Googling brought up this link which said that you had to add the typesafeEnumMaxMembers attribute to your <globalbindings> tag to enable it to generate more than 256 elements in an enum type. This <globalbindings> tag is present in the bindings.xjb file in our project. We set typesafeEnumMaxMembers to 300 and found that we were able to generate the file, with it having all 262 enum elements!!

<globalBindings typesafeEnumMaxMembers="300"/>

This was a great relief since we had been Googling for many hours and had become frustrated. Googling further, we learnt more about JAXB and the xjc tool. I was aware that JAXB was a tool that could be used to do the conversions from XML to Java and vice versa, but I had never really dwelt into and learnt more about it. Hence xjc was new to me. In the end, I understood that it was xjc that did the job of generating the Java classes. You could customise the way xjc generates the classes by creating an external bindings file, which had to have the extension '.xjb'.

And that's where the file, 'bindings.xjb' in our project came in. You can inform JAXB about the presence of this binding file by passing the file name to the -b parameter of the xjc command. Since we were using the cxf-codegen-plugin and not using the xjc command directly, we configured these arguments via the <executions> tag of <plugin> tag in pom.xml. Basically, we did this:


One thing that made us wonder was why there was a limit in the first place, and why the default value was 256. We were not able to find any answers for this, but the JAXB spec itself lists the default value to be 256. I read somewhere on the Internet that this was because having a Java enum with 256 entries is unmanageable and unmaintainable. But we felt that even having 100 - 200 entries should be unmanageable - in that case, why is not the default value somewhere between 100 and 200? Why specifically 256?

Friday, October 29, 2010

Great circle routes

In late March, my company informed me I had to travel to the US. They asked me to get all required stuff ready. By the time my visa was ready and I was able to book tickets, it was already the third week of April. It looked like I could book tickets for any day from the 4th week of April or the 1st week of May only.

On April 14th, the Eyjafjallajökull volcano exploded.

As you probably know, all airline schedules went haywire. However, I was not worried much. After all, the volcano was in Iceland, and the affected areas were mostly in Europe. Surely, my flight to the US wouldn't be travelling over Europe!! Why should it? When travelling from India to the US, I thought, my flight would probably take the Saudi Arabia - Egypt - Algeria - Atlantic Ocean - US route, wouldn't it? After all, when you have a map of the world in front of you, that seems to be the most straight and efficient route. A map of the route I thought my flight would take is shown below:

Of course, if the ash cloud grew to the extent where it began to intrude into North Africa, then I would have some problems - but I thought I would think about what to do if it ever came to that.

By the time I was ready to book my tickets, the situation had eased a little - flights were allowed so long as they flew via routes where the ash was less concentrated. When I went to the travel desk, I was told there were no bookings being done, as all flights were cancelled. I was somewhat surprised and reminded the travel desk that flights were being allowed up in the air. The travel desk replied that though flights were being allowed, the airlines were concentrating on clearing the backlog of passengers first. I reported this to my manager, who told me that as my travel was urgent, I would have to get tickets somehow.

I went back to the travel desk. "I need a ticket to the US".

"No sir... as we already said, the airlines are not accepting bookings. They are only trying to clear the backlog".

"Ok... which route are you considering?"

The lady mentioned some routes via Europe and the Middle East. I understood it was impossible via Europe as that was the most affected area, but why were there no tickets for routes via Middle East?

Me: "Can't you book on the Chennai - Singapore - US route?" I was thinking Singapore to US would probably fly Phillippines - Pacific Ocean - US, which meant that they would avoid the ash cloud.

"No sir... no bookings".

Confused, I tried to be even more clear. I said, "No.. I mean the Chennai - Singapore - Tokyo - US route. Surely, there should be some tickets there!!"

"No sir... bookings not allowed".

Even more confused now, I asked the lady why bookings were not allowed on that route. Surely the ash cloud was not affecting those areas!!

"I don't know sir... but bookings are not allowed".

Not wanting to argue any further, I reported this to my manager. There followed a long series of trips to and from the travel desk, trying desperately to book a ticket ASAP. Every day was spent with me atleast visiting them once, and in some cases twice. No change. Another manager suggested booking on the India - Johannesburg - US route, which the lady frowned upon. In between I learnt that my company had an upper limit on the total cost of a ticket booking, which meant that some routes were effectively removed from consideration.

Finally, after one or two weeks, my tickets were confirmed. My route was Chennai - Doha - Washington by Qatar Airways - a hop through the Middle East. I wondered why this ticket was not available earlier.

With packing and other travel-related work, this issue went to the back of my mind. Finally, the day came, and I boarded the Qatar flight to Doha. The 5-hour journey was uneventful and I landed in Doha. Two hours later, I boarded my Doha - Washington flight.

Once settled into my seat, I looked around and noticed that this plane had TV screens behind each seat, and at the beginning of every passenger section. These were showing the route we would take and the route shown was this:

(Note: The route shown here is not the exact route my flight took. Though it has been a few months since my flight, I do remember the route going over Finland as well as Iceland and Greenland. But you do get the general idea).


I didn't believe it at first!! Surely this must be a mistake. The flight was not going to travel over the Mediterranean Sea or North Africa. This meant that the flight was taking a roundabout route. I immediately rejected what the display was showing and thought to myself to note what route the plane actually took.

We departed. As I had a meeting the day after I landed in the US, I had planned to have naps during the flight to avoid jet lag as much as possible. I had a short nap. Lunch was served. All along, I kept watching the display. The flight took the route shown before. I thought at some point, the flight would turn and go on the route I had thought it would take, but no, the flight kept going on and on on the route shown, until many hours later, I reconciled myself to the fact that the flight was not going to change direction.

I was angry - I shall accept it. I was needlessly being kept in a flight for 14 hours when a shorter direct route existed, one that would take lesser time. But I soon realized I could be mistaken. No pilot would do that; he could be reprimanded by the airlines. I also knew that sometimes flight plans were generally prepared by somebody other than the pilot, and if that somebody had prepared this route, the pilot would want to know why. Also controllers on the ground would want to know why the flight was taking this route. And more important, I remembered reading somewhere that flight fuel costs alone were a significant percentage of an airline company's expenses - no pilot would be foolish enough to run a route longer than the shortest one, unless there were reasons. To top it all, this was my first international flight, and there was always a possibility that I might not know something.

In short, everything was loaded against me. If my thoughts about the flight's route were right, then it had to be a very very exceptional case, and I would hear about it on landing; otherwise, I was surely wrong. I suspected the latter.

Realization dawned somewhere over the Atlantic, I guess.

Great circle

The initial route I had arrived at (Chennai - Saudi Arabia - Egypt - Algeria - Atlantic Ocean - US) had been based on a paper map of the world. I had plotted the most direct route if the Earth had been flat, as shown on a paper map. But the Earth is not flat - it is a sphere, which means that Doha and Washington were on opposite sides of the Earth. On this spherical image of the Earth, my expected route would look like this:

Adding the route my flight actually took to the above map makes it look like this:

Clearly, the actual route is a straight line, rather than the one I initially thought of, which is curved, and travels a greater distance. And we all know from our geometry class that a straight line is the shortest distance between any two points. Note the route taken - it passes over Europe, crosses the Atlantic Ocean and enters North America over Canada, which is also the route my flight roughly took.

So yes, I was wrong and the flight route taken was the shortest one. But if my new understanding was right, then it had to be documented somewhere. A search on Google/Wikipedia should reveal whether I was right. And yes, Wikipedia has an article on it. Such routes are known as "great circle routes", since the shortest line joining any two points on a sphere is known as a great circle in geometry. A great circle on a sphere is equivalent to a straight line in linear geometry.

Good.. my flight was like this. How about other flights? For example, Europe to America. Let us take Frankfurt - Washington since we have a Chennai - Frankfurt flight and there is a chance I could have flown on it.

Great circle again!!

Ohkay, now how about Singapore - Washington? Does it also take a great circle route?

Aha, it does!! So this is the reason why the travel desk could not book on this flight. Probably, this flight too was cancelled!!

Interestingly, this flight seems to pass right over the North Pole. That should be exciting - imagine sitting on an airplane, having your lunch and looking at the display in front of you, which says you are flying over the North Pole. How thrilling would that be? In fact, while there is no flight between Singapore and Washington currently, we do have a Singapore - New York flight operated by Singapore Airlines, and it passes a few miles close to the North Pole. (see here for proof).

Interesting, but does this work for flights in the Southern Hemisphere too? Let us take a flight from Sao Paulo (Brazil) to Sydney (Australia).

Woo... the flight passes over Antarctica!!

Update (11th Feb 2011): So the lesson here is that if you have a flight that flies between two cities that lie in the same hemisphere, then the flight route is plotted as a great sphere route (assuming the weather is fine along the route. Otherwise, there would be deviations). Note the text in bold - lie in the same hemisphere. Why should the cities lie in the same hemisphere? Do flights travelling between cities across the Equator not have to travel via great circle routes? Yes they do have to travel along great circle routes, but that would roughly approximate the route you would draw on a paper map.

Nope, I shall put up my finger and accept that I was wrong in that last paragraph. I had assumed that flights that cross the Equator would more or less follow the straight line you drew between the two cities on a flat Earth. I guess I made this assumption on the fact that the route between Doha and Sao Paulo is like this:

which is erm... roughly a straight line..

Unhappily, just this example is not enough to argue that trans-equatorial flights do not fly on great circle routes, or to argue that their routes are roughly equivalent to straight lines. One example is enough - London to Sydney. I expected the route to be somewhat like this:

But in reality the route turns out to be this:

Why is this so? Again, the route I expected to see is because my mind still thinks of the world as a flat paper map. But of course the Earth is not flat, which means that the route you would get is the second one. Here is how the route would look if we had rightly visualized the Earth as a sphere in our minds:

So the lesson here is to think of the Earth as a sphere when mapping flight routes between two points, wherever those two points may be and whatever the distance between them!!

(All flight routes generated by the excellent flight route mapping website, Great Circle Mapper).

UPDATE (16th Dec 2012): Another proof of this is FlightRadar24, a website that shows flights travelling across the globe in realtime. You can see for yourselves the routes flights take.