Thursday, June 08, 2017

[CSS] How are conflicting styles resolved?

If you have worked in CSS, then you’ll know that you can assign a CSS property using the syntax:
property-name: value
For example, if you have a <span> tag with ID ‘content’, for which you want to assign the color green, you’d add this in your CSS file:
#content { color: green; }
There are other ways you can specify the same property:
span#content {color: green;}, and
.content {color: green;} in combination with <span class="content">lorem ipsum</span>

Here's the interview question

What happens though when you have multiple instances of the same property being set & they all apply to the same HTML tag too? Here’s an example:
Consider this tag,
<span id="content" style="color: blue;">some content</span>
while the CSS definition in the associated CSS file that can match the element is:
#content {color: green;}
Since multiple styles match, which one will the browser render? Answer: The text in the span element will be rendered in blue.
Why? Why did the browser decide to apply blue? As per the CSS spec, there are two aspects to be considered when deciding which style a browser will apply among competing styles. Resolving these two aspects tells the browser which competing style should win. They are: 1) Cascading order, & 2) Specificity. We'll first look at Cascading order and later in the post, Specificity.

Cascading order

In English, the term "cascade" is used to describe a process where there are multiple steps. For example, a cascading waterfall is one in which water flows down multiple steps.
If that is the case, what does "Cascading Style Sheets" mean? What steps are there in CSS? It turns out there are multiple ways through which style definitions for a web page can be assigned. They are: author, user & user agent.
  • Author styles are those which all software developers know - they are created by the authors of the web page as CSS files or style attributes in HTML tags.
  • User styles are those styles which users of web browsers can configure on their browser. For example, users can configure that browsers render particular fonts by replacing it with other fonts - this is particular useful from an accessibility standpoint.
  • User agent styles are those styles that are provided by default by the browser. For example, if no colour information is provided, then text is rendered black on a white background by default - this is an example of user agent styling.
The "cascade" in Cascading Style Sheets flows thus: If there are conflicts in property definitions across user, author or user agent style definitions, then the precedence is as follows:
Author > User > User agent

Example 1

In this example, we’re going to determine what happens if a user CSS file has a definition that conflicts with a definition in the user agent's default CSS file. The user agent we’re going to use is Internet Explorer. It already has a user agent CSS file (this is why a plain HTML file without any styling will render black text on a white background.) We will now change the way IE renders text color inside tags by default by providing a user CSS file.
Create a file by name, my_style.css. The content of this file is just this one line:
div{color:red}
We will now tell IE to use this file from now on for all web pages. The way to do so is this:
  • Open Internet Explorer
  • Click on the Tools menu & choose Internet Options
  • Click on the General tab & choose Accessibility. You should get a screen like this:
  • Under the User style sheet section, enable the Format documents using my style sheet checkbox.
  • Now click Browse… under the same checkbox and choose my_style.css.
  • Restart Internet Explorer.
We now need to create an HTML file that we can load into the browser to test that IE uses the my_style.css. Create a file by name, test_my_style.html. The content of this file is:
<html>
  <head>
     <title>Testing user styles</title>
  </head>
  <body>
      <div>This is a test file to test user styles.</div>
  </body>
</html>
Opening this file in Internet Explorer gives us this output:
What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. IE followed the CSS spec which states that User CSS property definitions have priority over user agent CSS property definitions and rendered the text in red color.

Example 2

What happens if we introduce a further conflict by having an author-defined CSS file? For this, we will create another CSS file, author_style.css, where we will provide the following definition:
div {color:blue}
We will also change test_my_style.html to include author_style.css as follows.
<html>
  <head>
    <title>Testing user styles</title>
    <link href="author_style.css" rel="stylesheet"></link>
  </head>
  <body>
    <div>This is a test file to test user styles.</div>
  </body>
</html>
Opening this file in Internet Explorer gives us this output:
What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. The author’s CSS file, author_style.css, overrode that even further setting up another conflict. IE followed the CSS spec which states that Author CSS property definitions have priority over all other CSS property definitions and rendered the text in blue color.

An exception

The only exception to the cascade order above is if the property definition is marked as !important, in which case user definitions take precedence over author definitions for that property. There are no property definitions marked !important in the user agent CSS file.
Let’s look at an example: We will reuse the same files as before, but we will change my_style.css to this:
div{color:red !important}
Now if we open our test_my_style.html in IE, we get this output:
What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. The author’s CSS file, author_style.css, overrode that even further setting up another conflict. However, IE noticed the !important in my_style.css and followed the CSS spec which states that User CSS property definitions with !important have priority over all other CSS property definitions and rendered the text in red color.

Specificity

The approach mentioned above will still cause conflicts since one of the user/author stylesheets can have conflicting style definitions. To resolve this, CSS provides another mechanism which browsers can use - specificity. While there isn’t a definition of specificity in the spec, my definition is: Specificity determines how specific the style definition is. Here, specific means how many HTML elements does the CSS selector match - the less elements it matches, the more specific it is, the more elements it matches, the less specific it is.

Calculation of specificity

The calculation of specificity is done in the following manner:
Assume there are four numbers separated by commas, and their initial values are zero:
0,0,0,0
The first number represents the presence of a style attribute in the element's HTML. If a style attribute is present, then the first number becomes 1, otherwise 0.
The second number represents the number of id attributes in the selector.
The third number represents the number of attributes and pseudo-classes in the selector.
The fourth number represents the number of element names and pseudo-elements in the selector.
Unlike in the decimal system, if a number reaches the value 10, then it does not carry over to the preceding number. Thus, specificity values like 0,10,0,9 are perfectly valid.
Now that we know what specificity is, let’s take a look at some example CSS definitions, and try to understand what specificity value they evaluate to:
Example 1: div.content {color:red}. It is not a style attribute in a HTML tag, nor does it have any HTML IDs mentioned in the selector. Thus the first two numbers are 0,0. It has a class attribute value mentioned(.content), and it also has a HTML element mentioned (div). Thus, the final two values of the specificity are 1,1. Hence it's final specificity value is 0,0,1,1.
Example 2: #content::first-letter. It is not a style attribute in a HTML tag, but it has a HTML ID mentioned in the selector. Thus the first two numbers are 0,1. It has a pseudo-element mentioned(::first-letter), and it doesn't have any HTML elements mentioned. Thus, the final two values of the specificity are 0,1. Hence it’s specificity value is 0,1,0,1.
Example 3: div[data-name=Tom][data-url=/member/1]. It is not a style attribute in a HTML tag, nor does it have any HTML IDs mentioned in the selector. Thus the first two numbers are 0,0. It has two attributes mentioned(data-name & data-url), and it has 1 HTML element mentioned (div). Thus, the final two values of the specificity are 2,1. Hence it’s specificity value is 0,0,2,1.

Resolving conflicts with specificity

Given two specificity values, you can compare them to find out which one is greater or lesser. A specificity value is greater than another specificity value if the first specificity’s first number is greater than the second specificity’s first number. In case the first number of both values are the same, then the browser moves on to compare the second number of both specificity values, and so on.
Here are some examples:
1,0,0,0 is greater than 0,10,0,0
0,10,0,0 is greater than 0,0,20,0
How is specificity helpful in resolving conflicts? As per the CSS spec, browsers are supposed to resolve conflicts by choosing those CSS definitions that have a higher specificity.

Example

Let’s take the example in the interview question above:
In the HTML, we have:
<span id="content" style="color: blue;">some content</span>
while in the CSS file, we have:
#content {color: green;}
Constructing the specificity for the style definition in the HTML style attribute, we get:
1,0,0,0
Constructing the specificity for the CSS style definition, we get:
0,1,0,0
Because the first specificity value is greater than the second, the style definition in the style attribute of the HTML tag wins.

Is it possible to still have conflicts?

Yes. For example, there could be two definitions in an author CSS file which target the same elements and have the same specificity. In such cases, the CSS spec says that browsers can use the definition that appears later.
An example:
Let’s say that we have two CSS definitions as below:
div {color:blue};
div {color:red};
for this HTML,
<html>
  <head>
     <title>Testing user styles</title>
  </head>
  <body>
     <div>This is a test file to test user styles.</div>
  </body>
</html>
Both CSS definitions evaluate to a value of 0,0,0,1.
In this case, the browser will simply render the text in red. 

Friday, October 07, 2016

Certifications - "Do they benefit me?" is the more important question

There's a lot of controversy regarding certifications.
Some people think certifications have no value. Much of this seems to stem from the possibility that a person having a certification may not actually have imbibed the knowledge for him to be effective and for his employers to reap the benefits. Some others think that like all exams, it's easy to cheat and get the certification. Some others think that the fact that some certifications have limited validity means their value to employers expires within a certain period of time. That is, unless the person demonstrates a constantly-learning attitude even without certifications coming into play, having a certification won’t help.
In my opinion, while all of these reasons may be correct, one shouldn’t ignore certifications. Here are some reasons why:
  • To folks outside the industry, certifications provide a proof that you have skills and those skills have been validated by a standards authority. To explain that sentence better, I quote here something I learnt on the Internet: anyone can drive vehicles without a license, but when you want to hire a driver, you’ll hire one with a license. When looking for a driver, you’ll avoid looking for a driver without a license because you do not want to add to the problems you already have on your plate (like getting into accidents). The only way to achieve this is to look for someone who has had his skills validated by a standards authority (here, the government licensing authority).
  • Another reason for certifications is that they are a great way to have a deep understanding of the technology involved. If you’re like most developers, then you probably have worked in a lot of technologies over the years. After doing so for a few years, some developers decide that it’s better to focus on a technology and become an expert at it, rather than jump from one technology to the other and skim only the basics of each technology. Once such a decision is made, the best way to achieve it within a reasonable timeframe is to get a certification in the technology. To get the certification, one will have to look out for some courses/books related to the certification. These courses/books teach basic & advanced concepts and also have mock exams where you can test your skills before taking the actual exam. Taking these mock exams (with all honesty & seriousness) helps in building your understanding of the technology, leading to better career opportunities.
    A lot of folks will say, “This isn’t really different from the usual advice that developers must read books". True, but taking a certification really makes you understand a technology, since you have to pass the exam (or atleast the mock tests), rather than just reading a book & potentially forgetting the concepts later.
  • If you’re totally new to the software field, having a certification helps to get a foot into the door. It demonstrates that you took extra effort to understand something, and that you have some basic knowledge. Keep in mind that that is all a certification can do - if you can’t code even though you have a certification, then you’ll not have a chance.
I want to expand on that last point. As Jeff Atwood says, software is a field where you can expect to work on multiple technologies & frameworks inside that technology. You’re often expected to demonstrate that you can do great work using a technology you may only a fair knowledge of. This means there’s going to be a frustrating period in which you ramp up on the technology only to be moved to a new technology later. It also means you’re going to get co-workers who are new to the technology, and are ramping up really slowly, making you wish they had read up on the basics before joining your team. In both cases, certifications come to your rescue - they guarantee that there is some basic knowledge that you (or your co-worker) have.
So to summarize, do not wish away certifications just because someone said so. Also, do not do a certification just because I said so. Think about the benefits that you get out of the certification. Think about the time invested in the certification and whether it is time that would return more value if invested elsewhere. Some certifications are valuable only for a certain time period; in that case, are you ok with your certification losing value some years down the line, or do you think you can keep updating the certification as the years go by? Think through the pros & cons from your angle (not from mine or someone else's), and take the decision that best fits you. 

Saturday, December 19, 2015

Why a mixed format is not recommended

While pairing with developers, I have often noticed that they have a tendency to periodically do a mixed format.

What is a mixed format?

Now I have no idea whether this is the official term, but here is what I mean when I say, “mixed format”. A mixed format is when a developer, working on some code, comes across some other code that is not formatted as per the project’s conventions. This code could span a few lines, or in worse cases, a whole file. The developer immediately invokes his editor’s format command, and formats the offending lines, or the whole file. With a satisfied smile on his face, the developer moves on to complete whatever work he was originally tasked to do. He then creates a commit that includes:
  • the work he was originally tasked to do, and
  • the formatting that he set right.

What’s wrong here?

Now, from the point of view of clean code and team work, formatting is not wrong. However, I do not recommend crafting a commit that mixes both format changes and logic changes, when the following conditions hold true:
  1. The format changes are not related to the actual lines that the logic change encompasses
  2. The format changes are more than logic changes
Why? Consider what happens when the developer goes ahead and checks in his code to the VCS. Other developers reviewing his commit immediately notice that the commit’s code changes are too many - this results in an impression forming in the reviewer's mind which can range between “Wow, this is a large commit. I need to go line by line” to a feeling of just giving up. With inexperienced or bored developers, it is usually the latter.
Also consider what happens when sometime in the future, a developer realizes that your commit introduced a line that causes a bug. In order to ensure a clean fix, he opens your commit with the intention of understanding what you intended to fix. And he arrives at the same realisation - your code changes are too many. Without any choice, he is forced to go through each line to understand what it does. Imagine his frustration when most lines turn out to be formatting changes, and hidden among the formatting changes is the actual change he’s looking for.
The lesson here is to avoid large formatting changes mixed with logic changes. Prefer to stick to formatting only those lines where your feature/bug also demands a change. If you can’t avoid this, then make two commits - one for the feature/bug changes, the other just for formatting changes.

This is only a recommendation, not a rule

As soon as you read this, please don’t fire up the comments editor or your blog editor to write a comment/blog about why I am wrong. I understand this is basically a Considered Harmful essay, and I know that Considered Harmful essays are considered harmful. With that in mind, I’ll only say that the above is a recommendation, not a rule. When making such a commit, please do think about how a future you would feel if you came across such a commit, and how you’d react. 

Sunday, December 21, 2014

Git: What are diffs and hunks?

When I was learning Git for the first time many years ago, one of the features that made me go, "Wow!! That's something I have really wanted all these years!" was the ability to choose which changes to commit among all the changes in a given file. I hadn’t seen this in the other version control systems I’d used, which were CVS and SVN.
Here’s an example of what I am trying to illustrate. Suppose I have a file named Employee.java with the following contents,
class Employee {
     private String firstName;
     private String lastName;

     Employee(String firstName, String lastName) {
          this.firstName = firstName;
          this.lastName = lastName;
     }

     public void equals(Employee e) {
          if !(e instanceof Employee)
               return false;
          return e.firstName.equals(this.firstName) && e.lastName.equals(this.lastName);
     }
}
Ignore the fact that there's no hashCode() implementation, please!!
You decide to add more functionality to Employee.java, namely, a grade instance variable and a toString() method that prints out who the employee is and what he does. Employee.java now looks like this:

class Employee {

     private String firstName;
     private String lastName;
     private String grade;

     Employee(String firstName, String lastName, String grade) {
          this.firstName = firstName;
          this.lastName = lastName;
          this.grade = grade;
     }

     public void equals(Employee e) {
          if !(e instanceof Employee)
               return false;
          return e.firstName.equals(this.firstName) && e.lastName.equals(this.lastName);
     }

     public void toString() {
          return “I am “ + this.firstName + “ “ + this.lastName + “, working as “ + this.grade;
     }
}
Ignore the fact that grade is not part of equals(), please!!
When you do a git diff on Employee.java, this is what you get:

When you do a git add at this point, all the newly introduced code will be ready for commit. Let’s say you want to add the toString() function as a separate commit. In other VCSs, that's not simple. You will have to maintain two copies of Employee.java, with one copy introducing the grade variable, and another copy introducing toString(). This is cumbersome, but in Git, is very easy. You just do
git add -p
which allows you to choose what pieces of code change to commit. For the above example, doing git add -p would give you


At this point, keying in 'y' will add this to the index, after which the next piece of code change is shown.


and so on…
When I learnt this, I thought, "All that’s fine, but what is the word ‘hunk’ doing there in “Stage this hunk?"? What does it mean anyway?”
To know what’s a hunk, you’ll have to know more about the output of the diff command. Note that we are not talking about git diff, but just diff.

Understanding the diff command

diff is the Linux command to generate a report that documents the differences between two files. According to Wikipedia, given two files, a and b, with b being an updated version of a, then diff basically reports what changes should be done on a to make it b.
The report that diff generates can be in 3 forms. They are: a) Edit script, b) Context format, or c) Unified format. With git diff, we get the Unified format.
The unified format, explained in short, goes like this:
The entire output of diff is called ‘diff’. That’s why people often say, “Send me the diff”. They are actually asking for the output of the diff command.
A diff begins with two lines that indicate the two files being compared. The first line begins with ‘---’ and indicates the original file, while the second line begins with ‘+++’ and indicates the newer file. Line additions are preceded with a  ‘+’ symbol, while line deletions are preceded with a ‘-’ symbol. Line modifications are represented as a combination of line deletion and addition.
Now, when a change occurs to a file, the change can be:  a) in only one line, b) in consecutive lines, or c) in lines spread all over the file.
Thus, the receiver of a diff would like to know which line numbers in the original unchanged file were changed. Hence, it is enough if the output of diff includes a special line that indicates the starting line position of the change, as well as the destination line position, followed by the actual changes. The destination line position is included since earlier changes in the same diff could have pushed the original line further down the file.
However, (especially in open-source projects), it is possible that two changes are applied to a file by two separate users at the same line. When integrating these two changes, it is not useful if you only have the line numbers. You also need to provide some context, by which we mean some lines before and after the changed line. This is useful when applying conflicting changes like the one above, as we can use it to determine how the second change should fit in on the first change.
The unified format handles both by providing context around the changed line, and also providing a special line that indicates where in the file, the first line of context starts, and how many lines of context are provided. To indicate that these lines are special lines that are only for the receiver’s understanding and are not part of the diff, the Unified format surrounds such special lines with ‘@@‘ symbols. Such lines are called range information lines. The format of a range information line is:
@@ -<<starting line number of context in original file,number of lines of context from original file>> +<<starting line number of context in modified file,number of lines of context from modified file>> @@

Understanding Employee.java diff

This should now help us understand the output of git diff that we did on Employee.java earlier. Let’s take a look at it again:

The first two lines that you see,
diff -- git a/Employee.java b/Employee.java
index b2ea747..cbdaf9e 100644
are generated by Git. Beyond this is the actual diff output. So let's ignore this and move onto the diff.

The first two lines in the diff,
--- a/Employee.java
+++ b/Employee.java
are the two files that diff is trying to compare. Employee.java is prefixed with ‘a/’ and ‘b/’ in the two lines because Git is comparing your copy of Employee.java with the copy in HEAD. Git tries to represent these two versions of Employee.java as being in two folders ‘a/’ and ‘b/’, just as a way of differentiating them. In reality, if you had used just diff, you would have provided two files physically present on the filesystem.

The first range information line is:
@@ -1,6 +1,7 @@
In the range information line, the “-1,6” indicates that the original file’s context provided starts from the first line of the file, and 6 lines of context are provided. The “+1,7” indicates that the new file’s context provided starts from the first line of the file, and 7 lines of context are provided. Why 7? Because of the addition of the grade variable, that is only present in the new file.
The second grade information line is:
@@ -12,5 +13,9 @@ class Employee {
In this range information line, the “-12,5” indicates that the original file’s context provided starts from the 12th line of the file, and 5 lines of context are provided. The “+13,9” indicates that the new file’s context provided starts from the 13th line of the file, and 9 lines of context are provided. Why is the starting line position in the new file 13? Because of the addition of the grade variable previously. Why 9 lines of context? Because of the addition of the toString() method in the new context.

So what’s a hunk?

Now that you’ve understood the diff output, it becomes easy to understand hunks. Hunks are simply the term for the combination of a range information line followed by the change information until the next range information line.

Wednesday, July 03, 2013

Restaurants: A novel way to remember orders!!

On a recent trip to the US, we used to go out for lunch with our clients to various places.

One hotel we went to seemed to be pretty popular, and there was usually a crowd during lunch. On this particular day, we sat down and placed our orders. We were a huge group, so our entire order was not easy to remember. But I remembered reading somewhere that waiters were good at remembering orders, and hence I decided to ignore it. "She has noted it down on a notepad anyway, so it shouldn't be a problem for her," I thought.

Our first order arrived, carried by a different waiter from the one who took the order. She came straight to the table, and placed it right in front of the person who had asked for that item.

I was surprised that even though she was different from the one who took our order, she knew which customer had ordered that item. I put it down to the original waiter informing the new one of who had placed that order.

The subsequent orders came and the same thing happened again and again. I was surprised. I looked around at another table, and after some time, noticed the same pattern. Different waiters would serve the same tables, and each waiter knew which customer had ordered what. These same waiters were also serving other tables, and even there, they seemed to know who had ordered what.

"Can they really remember to such an extent?," I wondered. I didn't think I could.

------------------

I forgot about the incident and was reminded of it on another day, when we went to a mobile diner of sorts. The mobile diner is just the same as the street food stalls and vans that we see in India.

I placed my order, and the lady gave me my copy of the receipt she wrote the order on. Here it is:




Notice the top row of figures?

There are various shapes with some numbers arranged around them. There is also a circled 'S' symbol.

The shapes are the tables in the restaurant. The numbers around the shapes are the customers that can sit on those tables. Each customer is assigned a number. The circled 'S' symbol is the waiter. Its expansion is probably "server".

When the waiter arrives to take your order, she stands in the position marked by the circled 'S'. She then notes down your order according to the position in which you sit. Thus, if you are the first on her left side, your order is marked against number 1.

This paper is then maintained until the orders are ready, at which point the waiter brings the food to the table along with the paper. Since she knows the name of the food, it's easy to find the customer's position from the paper. She then serves it directly to the customer!! This ensures that any free waiter can serve the food back to the table, and it is not necessary to wait for the original waiter to serve, or to ask the original waiter whom to serve to.

I saw this for the first time in my life in the US, and am not sure whether it exists in India. In most Indian restaurants I have been to, when the waiter comes to serve me food, I am the one indicating to the waiter which food should go to whom.

Thursday, January 24, 2013

JAXB - Generating an <simpleType> with more than 256 <enumeration>s


So this was a strange error that we faced a few weeks ago.

The client we work for has various teams with each exposing their functionality to other teams via web services. So, in effect, a web application can be built, with it talking to various web services to get work done. Our work that day was to make a new web service. This was similar to another web service, with certain differences in inputs and functionality between the two. For various reasons, we decided to create a copy of the first web service's WSDL file and make the changes in inputs to the second WSDL.

While we were doing so, we found that the previous WSDL has a field for accepting the country code, but its data type was marked as string. We felt that this could lead to wrong country codes in the database as people could input any value. Our database also had a master table that stored the country codes. While the web service code did verify the input against the table, we decided to change from string to a simpleType that had restricted elements. This would mean that our clients would never be able to provide invalid values.

Basically, we wanted to change from:

<element name="countryCode" type="string"></element>
to
<element name="countryCode" type="CountryCode"></element>
with CountryCode type being defined thus:
<simpletype name="CountryCode">
  <restriction base="string">
    <enumeration value="IN"></enumeration>
    <enumeration value="US"></enumeration>
  </restriction>
</simpletype>

Since our database has 262 country codes, we decided to list all of them, thus having 262 <enumeration> entries in CountryCode. This wasn't a very big work as we initially thought, thanks to copy-paste and IDEA's column selection feature.

We use Apache's cxf-codegen-plugin in our project to generate the Java classes that do much of the XML-Java conversions. cxf-codegen-plugin ties into Maven's generate-sources phase to generate the Java classes. So when we ran mvn generate-sources, we expected an enum type called CountryCode with 262 fields.

In reality, the class was not generated at all.

I immediately had a suspicion over the number of enum fields, because I had never written or seen a Java enum with that many fields. So we trimmed the simpleType to one entry and ran mvn generate-sources, and the result was that the CountryCode class was generated, with one field. When we brought back the entire list, no class was generated. So we commented out the entire list and slowly uncommented a few entries (from the top) one by one to see at which point the error occurred. The Java file was generated fine all along until we reached the final few entries (about 6 or so). At that point, the Java file was not generated.

Again, the thought of some count limitation entered our heads. We were also entertaining the possibility of some character we pasted being of a different encoding or some whitespace character inadvertently getting into our code because of the copy-paste. To rule out the second possibility, we deleted the <enumeration> entries for the 6 country codes and manually keyed them in ourselves. Still, it did not work. To further rule out this possibility, we commented all entries and then slowly uncommented entries from the bottom up. The Java file was generated until we reached the top few entries, at which point it failed.

So we were back to our count hunch.

We were thinking that maybe WSDL had an issue with so many <enumeration>s. We didn't think it would be so, but we decided to check anyway. The WSDL spec did not mention about any restrictions in number for the <enumeration> tag of <simpleType>. So we felt it had to be an issue with either the cxf-codegen-plugin or Java. Googling revealed that Java had a limit for the number of fields in an enum, and that was 65535. Since we were much below this, we ruled out Java as the problem.

So now the only thing left out was the cxf-codegen-plugin. Googling revealed that it internally made use of JAXB. Further Googling brought up this link which said that you had to add the typesafeEnumMaxMembers attribute to your <globalbindings> tag to enable it to generate more than 256 elements in an enum type. This <globalbindings> tag is present in the bindings.xjb file in our project. We set typesafeEnumMaxMembers to 300 and found that we were able to generate the CountryCode.java file, with it having all 262 enum elements!!

<globalBindings typesafeEnumMaxMembers="300"/>

This was a great relief since we had been Googling for many hours and had become frustrated. Googling further, we learnt more about JAXB and the xjc tool. I was aware that JAXB was a tool that could be used to do the conversions from XML to Java and vice versa, but I had never really dwelt into and learnt more about it. Hence xjc was new to me. In the end, I understood that it was xjc that did the job of generating the Java classes. You could customise the way xjc generates the classes by creating an external bindings file, which had to have the extension '.xjb'.

And that's where the file, 'bindings.xjb' in our project came in. You can inform JAXB about the presence of this binding file by passing the file name to the -b parameter of the xjc command. Since we were using the cxf-codegen-plugin and not using the xjc command directly, we configured these arguments via the <executions> tag of <plugin> tag in pom.xml. Basically, we did this:


<plugin>
  <groupid>org.apache.cxf</groupid>
  <artifactid>cxf-codegen-plugin</artifactid>
  <executions>
    <execution>
      <configuration>
        <defaultoptions>
          <extraargs>
            <extraarg>-b,${basedir}/src/main/resources/bindings.xjb</extraarg>
          </extraargs>
        </defaultoptions>
      </configuration>
    </execution>
  </executions>
</plugin>



One thing that made us wonder was why there was a limit in the first place, and why the default value was 256. We were not able to find any answers for this, but the JAXB spec itself lists the default value to be 256. I read somewhere on the Internet that this was because having a Java enum with 256 entries is unmanageable and unmaintainable. But we felt that even having 100 - 200 entries should be unmanageable - in that case, why is not the default value somewhere between 100 and 200? Why specifically 256?

Friday, October 29, 2010

Great circle routes

In late March, my company informed me I had to travel to the US. They asked me to get all required stuff ready. By the time my visa was ready and I was able to book tickets, it was already the third week of April. It looked like I could book tickets for any day from the 4th week of April or the 1st week of May only.

On April 14th, the Eyjafjallajökull volcano exploded.

As you probably know, all airline schedules went haywire. However, I was not worried much. After all, the volcano was in Iceland, and the affected areas were mostly in Europe. Surely, my flight to the US wouldn't be travelling over Europe!! Why should it? When travelling from India to the US, I thought, my flight would probably take the Saudi Arabia - Egypt - Algeria - Atlantic Ocean - US route, wouldn't it? After all, when you have a map of the world in front of you, that seems to be the most straight and efficient route. A map of the route I thought my flight would take is shown below:



Of course, if the ash cloud grew to the extent where it began to intrude into North Africa, then I would have some problems - but I thought I would think about what to do if it ever came to that.

By the time I was ready to book my tickets, the situation had eased a little - flights were allowed so long as they flew via routes where the ash was less concentrated. When I went to the travel desk, I was told there were no bookings being done, as all flights were cancelled. I was somewhat surprised and reminded the travel desk that flights were being allowed up in the air. The travel desk replied that though flights were being allowed, the airlines were concentrating on clearing the backlog of passengers first. I reported this to my manager, who told me that as my travel was urgent, I would have to get tickets somehow.

I went back to the travel desk. "I need a ticket to the US".

"No sir... as we already said, the airlines are not accepting bookings. They are only trying to clear the backlog".

"Ok... which route are you considering?"

The lady mentioned some routes via Europe and the Middle East. I understood it was impossible via Europe as that was the most affected area, but why were there no tickets for routes via Middle East?

Me: "Can't you book on the Chennai - Singapore - US route?" I was thinking Singapore to US would probably fly Phillippines - Pacific Ocean - US, which meant that they would avoid the ash cloud.

"No sir... no bookings".

Confused, I tried to be even more clear. I said, "No.. I mean the Chennai - Singapore - Tokyo - US route. Surely, there should be some tickets there!!"

"No sir... bookings not allowed".

Even more confused now, I asked the lady why bookings were not allowed on that route. Surely the ash cloud was not affecting those areas!!

"I don't know sir... but bookings are not allowed".

Not wanting to argue any further, I reported this to my manager. There followed a long series of trips to and from the travel desk, trying desperately to book a ticket ASAP. Every day was spent with me atleast visiting them once, and in some cases twice. No change. Another manager suggested booking on the India - Johannesburg - US route, which the lady frowned upon. In between I learnt that my company had an upper limit on the total cost of a ticket booking, which meant that some routes were effectively removed from consideration.

Finally, after one or two weeks, my tickets were confirmed. My route was Chennai - Doha - Washington by Qatar Airways - a hop through the Middle East. I wondered why this ticket was not available earlier.




With packing and other travel-related work, this issue went to the back of my mind. Finally, the day came, and I boarded the Qatar flight to Doha. The 5-hour journey was uneventful and I landed in Doha. Two hours later, I boarded my Doha - Washington flight.

Once settled into my seat, I looked around and noticed that this plane had TV screens behind each seat, and at the beginning of every passenger section. These were showing the route we would take and the route shown was this:


(Note: The route shown here is not the exact route my flight took. Though it has been a few months since my flight, I do remember the route going over Finland as well as Iceland and Greenland. But you do get the general idea).

Hello?

I didn't believe it at first!! Surely this must be a mistake. The flight was not going to travel over the Mediterranean Sea or North Africa. This meant that the flight was taking a roundabout route. I immediately rejected what the display was showing and thought to myself to note what route the plane actually took.

We departed. As I had a meeting the day after I landed in the US, I had planned to have naps during the flight to avoid jet lag as much as possible. I had a short nap. Lunch was served. All along, I kept watching the display. The flight took the route shown before. I thought at some point, the flight would turn and go on the route I had thought it would take, but no, the flight kept going on and on on the route shown, until many hours later, I reconciled myself to the fact that the flight was not going to change direction.

I was angry - I shall accept it. I was needlessly being kept in a flight for 14 hours when a shorter direct route existed, one that would take lesser time. But I soon realized I could be mistaken. No pilot would do that; he could be reprimanded by the airlines. I also knew that sometimes flight plans were generally prepared by somebody other than the pilot, and if that somebody had prepared this route, the pilot would want to know why. Also controllers on the ground would want to know why the flight was taking this route. And more important, I remembered reading somewhere that flight fuel costs alone were a significant percentage of an airline company's expenses - no pilot would be foolish enough to run a route longer than the shortest one, unless there were reasons. To top it all, this was my first international flight, and there was always a possibility that I might not know something.

In short, everything was loaded against me. If my thoughts about the flight's route were right, then it had to be a very very exceptional case, and I would hear about it on landing; otherwise, I was surely wrong. I suspected the latter.

Realization dawned somewhere over the Atlantic, I guess.

Great circle


The initial route I had arrived at (Chennai - Saudi Arabia - Egypt - Algeria - Atlantic Ocean - US) had been based on a paper map of the world. I had plotted the most direct route if the Earth had been flat, as shown on a paper map. But the Earth is not flat - it is a sphere, which means that Doha and Washington were on opposite sides of the Earth. On this spherical image of the Earth, my expected route would look like this:



Adding the route my flight actually took to the above map makes it look like this:



Clearly, the actual route is a straight line, rather than the one I initially thought of, which is curved, and travels a greater distance. And we all know from our geometry class that a straight line is the shortest distance between any two points. Note the route taken - it passes over Europe, crosses the Atlantic Ocean and enters North America over Canada, which is also the route my flight roughly took.

So yes, I was wrong and the flight route taken was the shortest one. But if my new understanding was right, then it had to be documented somewhere. A search on Google/Wikipedia should reveal whether I was right. And yes, Wikipedia has an article on it. Such routes are known as "great circle routes", since the shortest line joining any two points on a sphere is known as a great circle in geometry. A great circle on a sphere is equivalent to a straight line in linear geometry.

Good.. my flight was like this. How about other flights? For example, Europe to America. Let us take Frankfurt - Washington since we have a Chennai - Frankfurt flight and there is a chance I could have flown on it.



Great circle again!!

Ohkay, now how about Singapore - Washington? Does it also take a great circle route?



Aha, it does!! So this is the reason why the travel desk could not book on this flight. Probably, this flight too was cancelled!!

Interestingly, this flight seems to pass right over the North Pole. That should be exciting - imagine sitting on an airplane, having your lunch and looking at the display in front of you, which says you are flying over the North Pole. How thrilling would that be? In fact, while there is no flight between Singapore and Washington currently, we do have a Singapore - New York flight operated by Singapore Airlines, and it passes a few miles close to the North Pole. (see here for proof).

Interesting, but does this work for flights in the Southern Hemisphere too? Let us take a flight from Sao Paulo (Brazil) to Sydney (Australia).



Woo... the flight passes over Antarctica!!

Update (11th Feb 2011): So the lesson here is that if you have a flight that flies between two cities that lie in the same hemisphere, then the flight route is plotted as a great sphere route (assuming the weather is fine along the route. Otherwise, there would be deviations). Note the text in bold - lie in the same hemisphere. Why should the cities lie in the same hemisphere? Do flights travelling between cities across the Equator not have to travel via great circle routes? Yes they do have to travel along great circle routes, but that would roughly approximate the route you would draw on a paper map.

Nope, I shall put up my finger and accept that I was wrong in that last paragraph. I had assumed that flights that cross the Equator would more or less follow the straight line you drew between the two cities on a flat Earth. I guess I made this assumption on the fact that the route between Doha and Sao Paulo is like this:



which is erm... roughly a straight line..

Unhappily, just this example is not enough to argue that trans-equatorial flights do not fly on great circle routes, or to argue that their routes are roughly equivalent to straight lines. One example is enough - London to Sydney. I expected the route to be somewhat like this:



But in reality the route turns out to be this:



Why is this so? Again, the route I expected to see is because my mind still thinks of the world as a flat paper map. But of course the Earth is not flat, which means that the route you would get is the second one. Here is how the route would look if we had rightly visualized the Earth as a sphere in our minds:




So the lesson here is to think of the Earth as a sphere when mapping flight routes between two points, wherever those two points may be and whatever the distance between them!!

(All flight routes generated by the excellent flight route mapping website, Great Circle Mapper).

UPDATE (16th Dec 2012): Another proof of this is FlightRadar24, a website that shows flights travelling across the globe in realtime. You can see for yourselves the routes flights take.