Wednesday, September 20, 2017

CSS Selectors - Level 1 & 2

As my interest in CSS grew, I have come across more CSS selectors that I should have used in my projects, but haven't due to ignorance. I wrote this blog post more as a way to freeze these selectors in my brain so that I remember to use them next time. To keep things simple, I make two assumptions for this post:
  1. We're trying to use CSS for HTML, and
  2. We're only using CSS selectors until CSS Selectors level 3. Level 4 is not yet a W3C Recommendation, so when that happens, I shall have a new blog post.
Also, this blog post talks only about selectors in CSS 1 & 2. Selectors in CSS 3 will be looked at in another blog post.

Level 1 CSS selectors


When CSS was first announced, it was called Level 1 CSS. It had only a few selectors. (Later "versions" of CSS added more selectors, as demands from web designers grew.) Because CSS 1 is quite old, it's also extremely well-known among developers & designers. Thus, for the most popular selectors in CSS 1, I'll skim over them as most developers/designers know these by now.

Type selector

This selector only contains the name of an HTML tag. All elements on the page of that type will be applied the associated style.
An example: span {color:red}

This will make all text inside <span> tags red in colour.

ID selector

The ID selector contains the name of an HTML element's ID attribute. Only the element that has that particular value for the ID attribute will have the style applied to it. The ID selector is constructed by having the hash (#) symbol followed by the HTML element's ID without any space.

Eg: #content {color:red} will render the HTML element with ID 'content' in red coloured text.

Class selector

The class selector defines a style that can be applied to a class (aka. category) of HTML elements. What determines if an element belongs to the class or not is the class attribute of the HTML element. If the class attribute's value matches the class selector, then the style is applied.

As an example, consider two span tags,
<span id="span1">Content</span>
<span id="span2" class="important_content">Warning message</span>

and a style definition,
.important_content {color: red;}

In the above case, only the <span> tag with id 'span2' will have the content in red colour. This is because it is the only <span> tag that belongs to the class of elements represented by 'important_content'. Other <span> tags in the same document do not belong to that class of elements and hence will not have the content in red colour.

Because this selector is used to categorize elements into a single grouping, you can have multiple elements having the same value for the class attribute - all those elements will be grouped into a single category represented by that class selector, and the style will be applied for all those elements.

Descendant selector

The descendant selector is used to select any element that is a descendant of any other element.

The syntax is ancestor targeted_element. The style is applied to targeted_element.

Here is an example. Consider you have the following HTML:
<div>
    <span>Span tag inside a div tag</span>
    <ul>
        <li>
            <span>This is the first list item.</span>
        </li>
        <li> This is the second list item.</li>
    </ul>
</div>
with this CSS applied:
div span {color:blue;}
The output we get is:

What's happening here is that all <span> tags at all levels of the <div>'s subtree get the style applied.

Pseudo-classes

CSS 1 had 3 pseudo-classes: :link, :visited, and :active. These work only on links and therefore, defining them on other elements has no effect. The :link pseudo-class is to specify how links will normally be shown, the :visited pseudo-class is to specify how those links on the page which point to URLs you have already visited are to be shown, and the :active pseudo-class is to specify how a link should appear when it is currently being selected.

Pseudo-elements

CSS 1 had 2 pseudo-elements: :first-letter & :first-line.
Note that in CSS 3, the single colon (:) at the beginning of each pseudo-element has been replaced by the double colon (::). Thus, if you're trying out these examples on a modern browser, remember to use double colons.For convenience, my example code already has them replaced.

The :first-letter pseudo-element

The :first-letter pseudo-element is used to specify a styling only for the first-letter of the element's content. Sometimes, it is to indicate that a new paragraph has started; other times it is for publishing scenarios such as drop caps.
As an example, let us assume that you have the following HTML content:
<div id=first_div>
    This is a single line of text.
</div>
<div id=second_div>76 is a number.</div>
<div id=third_div>
    "Let's go", he said, with determination.
</div>
and the following CSS is applied to the HTML:
#first_div::first-letter {font-size: 20pt;}
#second_div::first-letter {font-size: 20pt;}
#third_div::first-letter {font-size: 20pt;}
The result is this:
What's happening here is that the first letter of each <div> tag is being applied the style to increase its font size. We are able to do this without wrapping the first letter in any special tag (eg: a <span> tag). If not for the :first-letter pseudo-element, wrapping the first letter in a special tag would be the only way to achieve the same effect.

This is also the reason why :first-letter is a pseudo-element. The CSS selector applies the style as if there was a tag wrapped around the first letter with that special styling.

What is also noticeable is that if the content of the element starts with any quotes, then the :first-letter style applies to the quotes and the letter following it.

The :first-line pseudo-element

The :first-line pseudo-element is used to specify a styling only for the first line of the element's content. This is often required in publishing scenarios where publishers may prefer to highlight the first line of a new paragraph to make it easier to identify that a new paragraph is starting here.
As an example, let us assume that you have the following HTML content:
<div>
    They walked into the forest, not entirely unmindful of the animals that lurked there. But to them, much more than the dangers the animals represented, was the fear of losing out - of not achieving their goal.
</div>
and the following CSS is applied to the HTML:
div::first-line {text-transform:uppercase;}
The result is this:
What happened here is that only the first line of the <div> tag is in upper-case. Notice that we mean first line & not first sentence, i.e., CSS does not look for a full-stop (aka period) to indicate the end of the first sentence. It only applies the style until the content in the <div> tag wraps to the next line, at which point the style application stops.

If you resize your browser window to make the line longer or shorter, more (or less) content will be made upper-case.

Level 2 CSS selectors

Let us now look at what was newly introduced in CSS 2.

Universal selector

The universal selector represents all elements in an HTML document. The universal selector can only be written using the asterisk (*) symbol.
An example: *{color:red}
This will make all text red in colour.

Attribute selectors

CSS 2 introduces the concept of attribute selectors, where you can specify that your styles must match only those elements that have certain attributes or characteristics of attributes. CSS 2 introduces 4 attribute selectors (CSS 3 adds 3 more). Let's take a look at the 4 attribute selectors below:

Presence selector

This selector matches those elements that have the attribute specified. The attribute only has to be present in the element - it can have any value or it can have no value.

The syntax of the selector is element[attribute].

As before, here's an example. Let's assume you have the following HTML code:
<div id=first_para>
    <span>This is the first paragraph.</span>
</div>
<div id=second_para>
    <span>And this is the second.</span>
</div>
and the following CSS:
div[id] {color:blue;}
The result is this:
What happened here is that the blue colour styling was applied to both <div> tags, even though they have different HTML IDs. That is because the selector only checks for the presence of the attribute - it doesn't check for the presence of a value.

Where would such selectors be used? In HTML, there are places where an attribute just has to exist for some behaviour to be triggered - the attribute doesn't have values as per the HTML spec. An example is the selected attribute for checkboxes & list boxes.

Value selector

This selector matches those elements that have the attribute specified, and that attribute's value is the value specified in the selector. Note that this selector is case-sensitive - it will match the element only if the value of the attribute is the same case as the one specified in the selector.

The syntax of the selector is element[attribute=value].

Let's assume you have the following HTML:
<div id=first_para>
    <span>This is the first paragraph.</span>
</div>
<div id=First_para>
    <span>This is the second paragraph, but has the same ID, different only in case.</span>
</div>
and this CSS:
div[id=first_para] {color:blue;}
The output we get is this:
What happened here is that only the <div> tag with the ID attribute having first_para as its value got the style applied. Other <div> tags did not get that style because they didn't have the exact value with the exact case as in the selector.

Attribute sub-string selectors

There are 2 attribute selectors available in CSS 2.

The first selector matches those elements that have the attribute specified, and that attribute's value is the value specified in the selector. However, this selector will also match if the attribute has multiple values separated by spaces, and one of those values is the value specified in the selector.

The syntax of the selector is element[attribute~=value].

Here's an example. Assume that you have the following HTML:
<div id="content header">
    <span>CSS 2 is now a Recommendation</span>
</div>
<div id="content gist">
    <span>The standards body has approved the various new features in CSS 2.</span>
</div>
<div id="content body">
    <span>Detailed info on CSS 2 is found here.</span>
</div>
and the following CSS
div[id~=header] {text-transform:uppercase;}
div[id~=gist] {font-style: italic;}
the output will be:
The first property rule matched the first <div> tag because the id attribute has the word, header, in it. Similarly, the second property rule matched the second <div> tag because the id attribute has the word, gist, in it.
Again, as in previous selectors, the selector matches only if the attribute matches case-sensitively.

The second attribute sub-string selector matches those elements that have the attribute specified, and that attribute's value is the value specified in the selector. However, this selector will also match if the attribute has multiple values separated by hyphens, and that hyphenated set of values starts with the value specified in the selector.

The syntax of the selector is element[attribute|=value].

Let's look at an example. Assume the following HTML:
<div id="header-content">
    <span>CSS 2 is now a Recommendation</span>
</div>
<div id="gist-content">
    <span>The standards body has approved the various new features in CSS 2.</span>
</div>
<div id="body-content">
    <span>Detailed info on CSS 2 is found here.</span>
</div>
and the CSS applied is:
div[id|=header] {text-transform:uppercase;}
div[id|=gist] {font-style: italic;}
then the output we get is:
As before, the first property rule matched the first <div> tag, and the second property rule matched the second <div> tag.
Again, the attribute value is matched case-sensitively.

Child selectors

CSS 2 introduces the concept of child selectors, where you can specify which children of an element should your selector match. Note that this is different from the descendant selectors we saw in the CSS 1 section - the child selectors in CSS 2 only select children, while the descendant selector can go deeper in the tree.

CSS 2 introduces 2 child selectors, while CSS 3 introduces a lot more. Let's take a look at the 2 CSS 2 child selectors below:

:first-child pseudo-class

The :first-child pseudo-class matches those elements which are the first child of a parent element. In the pseudo-class, you only specify the type of the child element - no information about the parent needs to be provided. The pseudo-class searches for all elements of that type, and then runs through each result to check if it is a child of some other element, and if yes, checks further to determine if it is the first child. The result of this check gives a list of elements for which the style is applied.

The syntax is child-element:first-child.

Consider this example HTML:
<span>under the body tag</span><div>
    <span>First child under the div tag</span>
    <span>Second child under the div tag</span>
</div>
</body>
and this CSS that is applied on that HTML:
span:first-child {color:blue;}
the output we get is:
What happened here is that all tags inside the <body> tag are children of the <body> tag. Thus, the <span> tag immediately following the <body> tag is treated as a child, and it is the first child of the <body> tag. Thus the styling applied to it.

For the <span> tags inside the <div> tag, the first <span> tag is the first child of the <div> & hence got the styling applied, while the second <span> did not.

Again, the advantage here is that we got this styling applied without the need for any special elements to wrap around the first child. Also, in cases of DHTML, the browser will take care of applying the style in case children are added/removed.

Generic child selector

Another child selector available is the more generic version of the :first-child pseudo-class. It is used to match any element that is a child of any other element. This selector allows you to specify even the parent element of the targeted child element.

The syntax is parent > child.

Assume the following HTML:
<div>
    <span>first span</span>
    <span>second span</span>
    <ul>
        <li>
            <span>This is the first list item.</span>
        </li>
        <li> This is the second list item.</li>
    </ul>
</div>
and the following CSS:
div>span {color:blue;}
the output we get is:
The first two <span> tags under the <div> tag are matched. This is because the selector matches all children, not just the first child.

The <span> tags inside the <ul> tags are not matched because the selector only matches children of the parent, not grandchildren & other descendants.

Adjacent sibling selectors

CSS 2 also introduces the concept of sibling element selectors. The sibling selector is used to select those elements which have a specified sibling element that appears before them. Note that the specified sibling must appear before & not after.

The syntax for this selector is sibling + targeted_element. The style will be applied to targeted_element.

Consider this HTML:
<div>
    <ul>
        <li>
            <span>This is the first list item.</span>
        </li>
    </ul>
</div>
<span>First span under div</span>
<span>Second span under div</span>
and this CSS:
div+span {color:blue;}
then the output is:
What's happening here is that the first <span> tag has the style applied because its siblings is the <div> tags specified in the selector, and that <div> tag appears before the <span> tag.

Also, the <span> tags inside the <ul> tag are not selected by the selector since they do not have any siblings. Similar, the final <span> tag is also not selected since it's immediate preceding sibling is another <span> tag.

New pseudo-elements

CSS 2 introduces 2 new pseudo-elements, ::before & ::after. These pseudo-elements are used to render content either before or after the element. What's the big deal, you may ask? The deal is this: Sometimes you have a need to render repeating content, and the repeating content may need to appear either before or after a set of elements. An example of such a usecase would be adding red asterisks after every label in a form to indicate required fields. Instead of writing it into the HTML, you can style it as a CSS rule.

Assume we have the following form:
<form>
    <label for=id_no class=required>Identification Number</label>
    <input type=text name=id_no />
    <br>
    <label for=name>Name</label>
    <input type=text name=name />
    <br>
    <input type=submit />
</form>
with this CSS:
.required::after
{
    content: "*";
    color: red;
}
It results in this output:

Examples of other repeating content are page numbers, chapter numbers, etc.
Note that in CSS 3, the single colon (:) at the beginning of each pseudo-element has been replaced by the double colon (::). Thus, if you're trying out these examples on a modern browser, remember to use double colons.For convenience, my example code already has them replaced.

New pseudo-classes

The :lang pseudo-class

CSS 2 introduces a new pseudo-element, :lang. This pseudo-element is used to set styles based on the language of the document. The language may be set by multiple mechanisms depending upon the markup language, but in HTML, it is usually set using the lang attribute on the <html> tag.

Consider the HTML:
<html lang="fr">
    <head>
        <title>E:lang</title>
        <link href="lang.css" rel=stylesheet />
    </head>
    <body>
        <div>some content</div>
    </body>
</html>
and the content in lang.css:
div:lang(fr) {color:blue;}
then the output is that the text is rendered in blue colour.

If the language was changed to English("en") in the HTML, but not in the CSS file, then the text will be rendered in the browser's default colour.

The :hover & :focus pseudo-classes

The :hover pseudo-class is used to apply styling when the mouse pointer is currently over the element. The :focus pseudo-class is used to apply styling when the element receives focus, which can be due to mouse-click or a keyboard event.

As an example, consider the following HTML:
<input type=submit />
with the following CSS applied:
input:hover {cursor: pointer;}
input:focus {color: blue; border: 2px black solid;}
The following output is obtained when we press the Tab key to move focus to the button:

 You will also notice that if you move the mouse over the button, the mouse pointer will become the hand pointer instead of the typical arrow pointer.

Thursday, June 08, 2017

[CSS] How are conflicting styles resolved?

If you have worked in CSS, then you’ll know that you can assign a CSS property using the syntax:
property-name: value
For example, if you have a <span> tag with ID ‘content’, for which you want to assign the color green, you’d add this in your CSS file:
#content { color: green; }
There are other ways you can specify the same property:
span#content {color: green;}, and
.content {color: green;} in combination with <span class="content">lorem ipsum</span>

Here's the interview question

What happens though when you have multiple instances of the same property being set & they all apply to the same HTML tag too? Here’s an example:
Consider this tag,
<span id="content" style="color: blue;">some content</span>
while the CSS definition in the associated CSS file that can match the element is:
#content {color: green;}
Since multiple styles match, which one will the browser render? Answer: The text in the span element will be rendered in blue.
Why? Why did the browser decide to apply blue? As per the CSS spec, there are two aspects to be considered when deciding which style a browser will apply among competing styles. Resolving these two aspects tells the browser which competing style should win. They are: 1) Cascading order, & 2) Specificity. We'll first look at Cascading order and later in the post, Specificity.

Cascading order

In English, the term "cascade" is used to describe a process where there are multiple steps. For example, a cascading waterfall is one in which water flows down multiple steps.
If that is the case, what does "Cascading Style Sheets" mean? What steps are there in CSS? It turns out there are multiple ways through which style definitions for a web page can be assigned. They are: author, user & user agent.
  • Author styles are those which all software developers know - they are created by the authors of the web page as CSS files or style attributes in HTML tags.
  • User styles are those styles which users of web browsers can configure on their browser. For example, users can configure that browsers render particular fonts by replacing it with other fonts - this is particular useful from an accessibility standpoint.
  • User agent styles are those styles that are provided by default by the browser. For example, if no colour information is provided, then text is rendered black on a white background by default - this is an example of user agent styling.
The "cascade" in Cascading Style Sheets flows thus: If there are conflicts in property definitions across user, author or user agent style definitions, then the precedence is as follows:
Author > User > User agent

Example 1

In this example, we’re going to determine what happens if a user CSS file has a definition that conflicts with a definition in the user agent's default CSS file. The user agent we’re going to use is Internet Explorer. It already has a user agent CSS file (this is why a plain HTML file without any styling will render black text on a white background.) We will now change the way IE renders text color inside tags by default by providing a user CSS file.
Create a file by name, my_style.css. The content of this file is just this one line:
div{color:red}
We will now tell IE to use this file from now on for all web pages. The way to do so is this:
  • Open Internet Explorer
  • Click on the Tools menu & choose Internet Options
  • Click on the General tab & choose Accessibility. You should get a screen like this:
  • Under the User style sheet section, enable the Format documents using my style sheet checkbox.
  • Now click Browse… under the same checkbox and choose my_style.css.
  • Restart Internet Explorer.
We now need to create an HTML file that we can load into the browser to test that IE uses the my_style.css. Create a file by name, test_my_style.html. The content of this file is:
<html>
  <head>
     <title>Testing user styles</title>
  </head>
  <body>
      <div>This is a test file to test user styles.</div>
  </body>
</html>
Opening this file in Internet Explorer gives us this output:
What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. IE followed the CSS spec which states that User CSS property definitions have priority over user agent CSS property definitions and rendered the text in red color.

Example 2

What happens if we introduce a further conflict by having an author-defined CSS file? For this, we will create another CSS file, author_style.css, where we will provide the following definition:
div {color:blue}
We will also change test_my_style.html to include author_style.css as follows.
<html>
  <head>
    <title>Testing user styles</title>
    <link href="author_style.css" rel="stylesheet"></link>
  </head>
  <body>
    <div>This is a test file to test user styles.</div>
  </body>
</html>
Opening this file in Internet Explorer gives us this output:
What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. The author’s CSS file, author_style.css, overrode that even further setting up another conflict. IE followed the CSS spec which states that Author CSS property definitions have priority over all other CSS property definitions and rendered the text in blue color.

An exception

The only exception to the cascade order above is if the property definition is marked as !important, in which case user definitions take precedence over author definitions for that property. There are no property definitions marked !important in the user agent CSS file.
Let’s look at an example: We will reuse the same files as before, but we will change my_style.css to this:
div{color:red !important}
Now if we open our test_my_style.html in IE, we get this output:
What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. The author’s CSS file, author_style.css, overrode that even further setting up another conflict. However, IE noticed the !important in my_style.css and followed the CSS spec which states that User CSS property definitions with !important have priority over all other CSS property definitions and rendered the text in red color.

Specificity

The approach mentioned above will still cause conflicts since one of the user/author stylesheets can have conflicting style definitions. To resolve this, CSS provides another mechanism which browsers can use - specificity. While there isn’t a definition of specificity in the spec, my definition is: Specificity determines how specific the style definition is. Here, specific means how many HTML elements does the CSS selector match - the less elements it matches, the more specific it is, the more elements it matches, the less specific it is.

Calculation of specificity

The calculation of specificity is done in the following manner:
Assume there are four numbers separated by commas, and their initial values are zero:
0,0,0,0
The first number represents the presence of a style attribute in the element's HTML. If a style attribute is present, then the first number becomes 1, otherwise 0.
The second number represents the number of id attributes in the selector.
The third number represents the number of attributes and pseudo-classes in the selector.
The fourth number represents the number of element names and pseudo-elements in the selector.
Unlike in the decimal system, if a number reaches the value 10, then it does not carry over to the preceding number. Thus, specificity values like 0,10,0,9 are perfectly valid.
Now that we know what specificity is, let’s take a look at some example CSS definitions, and try to understand what specificity value they evaluate to:
Example 1: div.content {color:red}. It is not a style attribute in a HTML tag, nor does it have any HTML IDs mentioned in the selector. Thus the first two numbers are 0,0. It has a class attribute value mentioned(.content), and it also has a HTML element mentioned (div). Thus, the final two values of the specificity are 1,1. Hence it's final specificity value is 0,0,1,1.
Example 2: #content::first-letter. It is not a style attribute in a HTML tag, but it has a HTML ID mentioned in the selector. Thus the first two numbers are 0,1. It has a pseudo-element mentioned(::first-letter), and it doesn't have any HTML elements mentioned. Thus, the final two values of the specificity are 0,1. Hence it’s specificity value is 0,1,0,1.
Example 3: div[data-name=Tom][data-url=/member/1]. It is not a style attribute in a HTML tag, nor does it have any HTML IDs mentioned in the selector. Thus the first two numbers are 0,0. It has two attributes mentioned(data-name & data-url), and it has 1 HTML element mentioned (div). Thus, the final two values of the specificity are 2,1. Hence it’s specificity value is 0,0,2,1.

Resolving conflicts with specificity

Given two specificity values, you can compare them to find out which one is greater or lesser. A specificity value is greater than another specificity value if the first specificity’s first number is greater than the second specificity’s first number. In case the first number of both values are the same, then the browser moves on to compare the second number of both specificity values, and so on.
Here are some examples:
1,0,0,0 is greater than 0,10,0,0
0,10,0,0 is greater than 0,0,20,0
How is specificity helpful in resolving conflicts? As per the CSS spec, browsers are supposed to resolve conflicts by choosing those CSS definitions that have a higher specificity.

Example

Let’s take the example in the interview question above:
In the HTML, we have:
<span id="content" style="color: blue;">some content</span>
while in the CSS file, we have:
#content {color: green;}
Constructing the specificity for the style definition in the HTML style attribute, we get:
1,0,0,0
Constructing the specificity for the CSS style definition, we get:
0,1,0,0
Because the first specificity value is greater than the second, the style definition in the style attribute of the HTML tag wins.

Is it possible to still have conflicts?

Yes. For example, there could be two definitions in an author CSS file which target the same elements and have the same specificity. In such cases, the CSS spec says that browsers can use the definition that appears later.
An example:
Let’s say that we have two CSS definitions as below:
div {color:blue};
div {color:red};
for this HTML,
<html>
  <head>
     <title>Testing user styles</title>
  </head>
  <body>
     <div>This is a test file to test user styles.</div>
  </body>
</html>
Both CSS definitions evaluate to a value of 0,0,0,1.
In this case, the browser will simply render the text in red. 

Friday, October 07, 2016

Certifications - "Do they benefit me?" is the more important question

There's a lot of controversy regarding certifications.
Some people think certifications have no value. Much of this seems to stem from the possibility that a person having a certification may not actually have imbibed the knowledge for him to be effective and for his employers to reap the benefits. Some others think that like all exams, it's easy to cheat and get the certification. Some others think that the fact that some certifications have limited validity means their value to employers expires within a certain period of time. That is, unless the person demonstrates a constantly-learning attitude even without certifications coming into play, having a certification won’t help.
In my opinion, while all of these reasons may be correct, one shouldn’t ignore certifications. Here are some reasons why:
  • To folks outside the industry, certifications provide a proof that you have skills and those skills have been validated by a standards authority. To explain that sentence better, I quote here something I learnt on the Internet: anyone can drive vehicles without a license, but when you want to hire a driver, you’ll hire one with a license. When looking for a driver, you’ll avoid looking for a driver without a license because you do not want to add to the problems you already have on your plate (like getting into accidents). The only way to achieve this is to look for someone who has had his skills validated by a standards authority (here, the government licensing authority).
  • Another reason for certifications is that they are a great way to have a deep understanding of the technology involved. If you’re like most developers, then you probably have worked in a lot of technologies over the years. After doing so for a few years, some developers decide that it’s better to focus on a technology and become an expert at it, rather than jump from one technology to the other and skim only the basics of each technology. Once such a decision is made, the best way to achieve it within a reasonable timeframe is to get a certification in the technology. To get the certification, one will have to look out for some courses/books related to the certification. These courses/books teach basic & advanced concepts and also have mock exams where you can test your skills before taking the actual exam. Taking these mock exams (with all honesty & seriousness) helps in building your understanding of the technology, leading to better career opportunities.
    A lot of folks will say, “This isn’t really different from the usual advice that developers must read books". True, but taking a certification really makes you understand a technology, since you have to pass the exam (or atleast the mock tests), rather than just reading a book & potentially forgetting the concepts later.
  • If you’re totally new to the software field, having a certification helps to get a foot into the door. It demonstrates that you took extra effort to understand something, and that you have some basic knowledge. Keep in mind that that is all a certification can do - if you can’t code even though you have a certification, then you’ll not have a chance.
I want to expand on that last point. As Jeff Atwood says, software is a field where you can expect to work on multiple technologies & frameworks inside that technology. You’re often expected to demonstrate that you can do great work using a technology you may only a fair knowledge of. This means there’s going to be a frustrating period in which you ramp up on the technology only to be moved to a new technology later. It also means you’re going to get co-workers who are new to the technology, and are ramping up really slowly, making you wish they had read up on the basics before joining your team. In both cases, certifications come to your rescue - they guarantee that there is some basic knowledge that you (or your co-worker) have.
So to summarize, do not wish away certifications just because someone said so. Also, do not do a certification just because I said so. Think about the benefits that you get out of the certification. Think about the time invested in the certification and whether it is time that would return more value if invested elsewhere. Some certifications are valuable only for a certain time period; in that case, are you ok with your certification losing value some years down the line, or do you think you can keep updating the certification as the years go by? Think through the pros & cons from your angle (not from mine or someone else's), and take the decision that best fits you. 

Saturday, December 19, 2015

Why a mixed format is not recommended

While pairing with developers, I have often noticed that they have a tendency to periodically do a mixed format.

What is a mixed format?

Now I have no idea whether this is the official term, but here is what I mean when I say, “mixed format”. A mixed format is when a developer, working on some code, comes across some other code that is not formatted as per the project’s conventions. This code could span a few lines, or in worse cases, a whole file. The developer immediately invokes his editor’s format command, and formats the offending lines, or the whole file. With a satisfied smile on his face, the developer moves on to complete whatever work he was originally tasked to do. He then creates a commit that includes:
  • the work he was originally tasked to do, and
  • the formatting that he set right.

What’s wrong here?

Now, from the point of view of clean code and team work, formatting is not wrong. However, I do not recommend crafting a commit that mixes both format changes and logic changes, when the following conditions hold true:
  1. The format changes are not related to the actual lines that the logic change encompasses
  2. The format changes are more than logic changes
Why? Consider what happens when the developer goes ahead and checks in his code to the VCS. Other developers reviewing his commit immediately notice that the commit’s code changes are too many - this results in an impression forming in the reviewer's mind which can range between “Wow, this is a large commit. I need to go line by line” to a feeling of just giving up. With inexperienced or bored developers, it is usually the latter.
Also consider what happens when sometime in the future, a developer realizes that your commit introduced a line that causes a bug. In order to ensure a clean fix, he opens your commit with the intention of understanding what you intended to fix. And he arrives at the same realisation - your code changes are too many. Without any choice, he is forced to go through each line to understand what it does. Imagine his frustration when most lines turn out to be formatting changes, and hidden among the formatting changes is the actual change he’s looking for.
The lesson here is to avoid large formatting changes mixed with logic changes. Prefer to stick to formatting only those lines where your feature/bug also demands a change. If you can’t avoid this, then make two commits - one for the feature/bug changes, the other just for formatting changes.

This is only a recommendation, not a rule

As soon as you read this, please don’t fire up the comments editor or your blog editor to write a comment/blog about why I am wrong. I understand this is basically a Considered Harmful essay, and I know that Considered Harmful essays are considered harmful. With that in mind, I’ll only say that the above is a recommendation, not a rule. When making such a commit, please do think about how a future you would feel if you came across such a commit, and how you’d react. 

Sunday, December 21, 2014

Git: What are diffs and hunks?

When I was learning Git for the first time many years ago, one of the features that made me go, "Wow!! That's something I have really wanted all these years!" was the ability to choose which changes to commit among all the changes in a given file. I hadn’t seen this in the other version control systems I’d used, which were CVS and SVN.
Here’s an example of what I am trying to illustrate. Suppose I have a file named Employee.java with the following contents,
class Employee {
     private String firstName;
     private String lastName;

     Employee(String firstName, String lastName) {
          this.firstName = firstName;
          this.lastName = lastName;
     }

     public void equals(Employee e) {
          if !(e instanceof Employee)
               return false;
          return e.firstName.equals(this.firstName) && e.lastName.equals(this.lastName);
     }
}
Ignore the fact that there's no hashCode() implementation, please!!
You decide to add more functionality to Employee.java, namely, a grade instance variable and a toString() method that prints out who the employee is and what he does. Employee.java now looks like this:

class Employee {

     private String firstName;
     private String lastName;
     private String grade;

     Employee(String firstName, String lastName, String grade) {
          this.firstName = firstName;
          this.lastName = lastName;
          this.grade = grade;
     }

     public void equals(Employee e) {
          if !(e instanceof Employee)
               return false;
          return e.firstName.equals(this.firstName) && e.lastName.equals(this.lastName);
     }

     public void toString() {
          return “I am “ + this.firstName + “ “ + this.lastName + “, working as “ + this.grade;
     }
}
Ignore the fact that grade is not part of equals(), please!!
When you do a git diff on Employee.java, this is what you get:

When you do a git add at this point, all the newly introduced code will be ready for commit. Let’s say you want to add the toString() function as a separate commit. In other VCSs, that's not simple. You will have to maintain two copies of Employee.java, with one copy introducing the grade variable, and another copy introducing toString(). This is cumbersome, but in Git, is very easy. You just do
git add -p
which allows you to choose what pieces of code change to commit. For the above example, doing git add -p would give you


At this point, keying in 'y' will add this to the index, after which the next piece of code change is shown.


and so on…
When I learnt this, I thought, "All that’s fine, but what is the word ‘hunk’ doing there in “Stage this hunk?"? What does it mean anyway?”
To know what’s a hunk, you’ll have to know more about the output of the diff command. Note that we are not talking about git diff, but just diff.

Understanding the diff command

diff is the Linux command to generate a report that documents the differences between two files. According to Wikipedia, given two files, a and b, with b being an updated version of a, then diff basically reports what changes should be done on a to make it b.
The report that diff generates can be in 3 forms. They are: a) Edit script, b) Context format, or c) Unified format. With git diff, we get the Unified format.
The unified format, explained in short, goes like this:
The entire output of diff is called ‘diff’. That’s why people often say, “Send me the diff”. They are actually asking for the output of the diff command.
A diff begins with two lines that indicate the two files being compared. The first line begins with ‘---’ and indicates the original file, while the second line begins with ‘+++’ and indicates the newer file. Line additions are preceded with a  ‘+’ symbol, while line deletions are preceded with a ‘-’ symbol. Line modifications are represented as a combination of line deletion and addition.
Now, when a change occurs to a file, the change can be:  a) in only one line, b) in consecutive lines, or c) in lines spread all over the file.
Thus, the receiver of a diff would like to know which line numbers in the original unchanged file were changed. Hence, it is enough if the output of diff includes a special line that indicates the starting line position of the change, as well as the destination line position, followed by the actual changes. The destination line position is included since earlier changes in the same diff could have pushed the original line further down the file.
However, (especially in open-source projects), it is possible that two changes are applied to a file by two separate users at the same line. When integrating these two changes, it is not useful if you only have the line numbers. You also need to provide some context, by which we mean some lines before and after the changed line. This is useful when applying conflicting changes like the one above, as we can use it to determine how the second change should fit in on the first change.
The unified format handles both by providing context around the changed line, and also providing a special line that indicates where in the file, the first line of context starts, and how many lines of context are provided. To indicate that these lines are special lines that are only for the receiver’s understanding and are not part of the diff, the Unified format surrounds such special lines with ‘@@‘ symbols. Such lines are called range information lines. The format of a range information line is:
@@ -<<starting line number of context in original file,number of lines of context from original file>> +<<starting line number of context in modified file,number of lines of context from modified file>> @@

Understanding Employee.java diff

This should now help us understand the output of git diff that we did on Employee.java earlier. Let’s take a look at it again:

The first two lines that you see,
diff -- git a/Employee.java b/Employee.java
index b2ea747..cbdaf9e 100644
are generated by Git. Beyond this is the actual diff output. So let's ignore this and move onto the diff.

The first two lines in the diff,
--- a/Employee.java
+++ b/Employee.java
are the two files that diff is trying to compare. Employee.java is prefixed with ‘a/’ and ‘b/’ in the two lines because Git is comparing your copy of Employee.java with the copy in HEAD. Git tries to represent these two versions of Employee.java as being in two folders ‘a/’ and ‘b/’, just as a way of differentiating them. In reality, if you had used just diff, you would have provided two files physically present on the filesystem.

The first range information line is:
@@ -1,6 +1,7 @@
In the range information line, the “-1,6” indicates that the original file’s context provided starts from the first line of the file, and 6 lines of context are provided. The “+1,7” indicates that the new file’s context provided starts from the first line of the file, and 7 lines of context are provided. Why 7? Because of the addition of the grade variable, that is only present in the new file.
The second grade information line is:
@@ -12,5 +13,9 @@ class Employee {
In this range information line, the “-12,5” indicates that the original file’s context provided starts from the 12th line of the file, and 5 lines of context are provided. The “+13,9” indicates that the new file’s context provided starts from the 13th line of the file, and 9 lines of context are provided. Why is the starting line position in the new file 13? Because of the addition of the grade variable previously. Why 9 lines of context? Because of the addition of the toString() method in the new context.

So what’s a hunk?

Now that you’ve understood the diff output, it becomes easy to understand hunks. Hunks are simply the term for the combination of a range information line followed by the change information until the next range information line.

Wednesday, July 03, 2013

Restaurants: A novel way to remember orders!!

On a recent trip to the US, we used to go out for lunch with our clients to various places.

One hotel we went to seemed to be pretty popular, and there was usually a crowd during lunch. On this particular day, we sat down and placed our orders. We were a huge group, so our entire order was not easy to remember. But I remembered reading somewhere that waiters were good at remembering orders, and hence I decided to ignore it. "She has noted it down on a notepad anyway, so it shouldn't be a problem for her," I thought.

Our first order arrived, carried by a different waiter from the one who took the order. She came straight to the table, and placed it right in front of the person who had asked for that item.

I was surprised that even though she was different from the one who took our order, she knew which customer had ordered that item. I put it down to the original waiter informing the new one of who had placed that order.

The subsequent orders came and the same thing happened again and again. I was surprised. I looked around at another table, and after some time, noticed the same pattern. Different waiters would serve the same tables, and each waiter knew which customer had ordered what. These same waiters were also serving other tables, and even there, they seemed to know who had ordered what.

"Can they really remember to such an extent?," I wondered. I didn't think I could.

------------------

I forgot about the incident and was reminded of it on another day, when we went to a mobile diner of sorts. The mobile diner is just the same as the street food stalls and vans that we see in India.

I placed my order, and the lady gave me my copy of the receipt she wrote the order on. Here it is:




Notice the top row of figures?

There are various shapes with some numbers arranged around them. There is also a circled 'S' symbol.

The shapes are the tables in the restaurant. The numbers around the shapes are the customers that can sit on those tables. Each customer is assigned a number. The circled 'S' symbol is the waiter. Its expansion is probably "server".

When the waiter arrives to take your order, she stands in the position marked by the circled 'S'. She then notes down your order according to the position in which you sit. Thus, if you are the first on her left side, your order is marked against number 1.

This paper is then maintained until the orders are ready, at which point the waiter brings the food to the table along with the paper. Since she knows the name of the food, it's easy to find the customer's position from the paper. She then serves it directly to the customer!! This ensures that any free waiter can serve the food back to the table, and it is not necessary to wait for the original waiter to serve, or to ask the original waiter whom to serve to.

I saw this for the first time in my life in the US, and am not sure whether it exists in India. In most Indian restaurants I have been to, when the waiter comes to serve me food, I am the one indicating to the waiter which food should go to whom.

Thursday, January 24, 2013

JAXB - Generating an <simpleType> with more than 256 <enumeration>s


So this was a strange error that we faced a few weeks ago.

The client we work for has various teams with each exposing their functionality to other teams via web services. So, in effect, a web application can be built, with it talking to various web services to get work done. Our work that day was to make a new web service. This was similar to another web service, with certain differences in inputs and functionality between the two. For various reasons, we decided to create a copy of the first web service's WSDL file and make the changes in inputs to the second WSDL.

While we were doing so, we found that the previous WSDL has a field for accepting the country code, but its data type was marked as string. We felt that this could lead to wrong country codes in the database as people could input any value. Our database also had a master table that stored the country codes. While the web service code did verify the input against the table, we decided to change from string to a simpleType that had restricted elements. This would mean that our clients would never be able to provide invalid values.

Basically, we wanted to change from:

<element name="countryCode" type="string"></element>
to
<element name="countryCode" type="CountryCode"></element>
with CountryCode type being defined thus:
<simpletype name="CountryCode">
  <restriction base="string">
    <enumeration value="IN"></enumeration>
    <enumeration value="US"></enumeration>
  </restriction>
</simpletype>

Since our database has 262 country codes, we decided to list all of them, thus having 262 <enumeration> entries in CountryCode. This wasn't a very big work as we initially thought, thanks to copy-paste and IDEA's column selection feature.

We use Apache's cxf-codegen-plugin in our project to generate the Java classes that do much of the XML-Java conversions. cxf-codegen-plugin ties into Maven's generate-sources phase to generate the Java classes. So when we ran mvn generate-sources, we expected an enum type called CountryCode with 262 fields.

In reality, the class was not generated at all.

I immediately had a suspicion over the number of enum fields, because I had never written or seen a Java enum with that many fields. So we trimmed the simpleType to one entry and ran mvn generate-sources, and the result was that the CountryCode class was generated, with one field. When we brought back the entire list, no class was generated. So we commented out the entire list and slowly uncommented a few entries (from the top) one by one to see at which point the error occurred. The Java file was generated fine all along until we reached the final few entries (about 6 or so). At that point, the Java file was not generated.

Again, the thought of some count limitation entered our heads. We were also entertaining the possibility of some character we pasted being of a different encoding or some whitespace character inadvertently getting into our code because of the copy-paste. To rule out the second possibility, we deleted the <enumeration> entries for the 6 country codes and manually keyed them in ourselves. Still, it did not work. To further rule out this possibility, we commented all entries and then slowly uncommented entries from the bottom up. The Java file was generated until we reached the top few entries, at which point it failed.

So we were back to our count hunch.

We were thinking that maybe WSDL had an issue with so many <enumeration>s. We didn't think it would be so, but we decided to check anyway. The WSDL spec did not mention about any restrictions in number for the <enumeration> tag of <simpleType>. So we felt it had to be an issue with either the cxf-codegen-plugin or Java. Googling revealed that Java had a limit for the number of fields in an enum, and that was 65535. Since we were much below this, we ruled out Java as the problem.

So now the only thing left out was the cxf-codegen-plugin. Googling revealed that it internally made use of JAXB. Further Googling brought up this link which said that you had to add the typesafeEnumMaxMembers attribute to your <globalbindings> tag to enable it to generate more than 256 elements in an enum type. This <globalbindings> tag is present in the bindings.xjb file in our project. We set typesafeEnumMaxMembers to 300 and found that we were able to generate the CountryCode.java file, with it having all 262 enum elements!!

<globalBindings typesafeEnumMaxMembers="300"/>

This was a great relief since we had been Googling for many hours and had become frustrated. Googling further, we learnt more about JAXB and the xjc tool. I was aware that JAXB was a tool that could be used to do the conversions from XML to Java and vice versa, but I had never really dwelt into and learnt more about it. Hence xjc was new to me. In the end, I understood that it was xjc that did the job of generating the Java classes. You could customise the way xjc generates the classes by creating an external bindings file, which had to have the extension '.xjb'.

And that's where the file, 'bindings.xjb' in our project came in. You can inform JAXB about the presence of this binding file by passing the file name to the -b parameter of the xjc command. Since we were using the cxf-codegen-plugin and not using the xjc command directly, we configured these arguments via the <executions> tag of <plugin> tag in pom.xml. Basically, we did this:


<plugin>
  <groupid>org.apache.cxf</groupid>
  <artifactid>cxf-codegen-plugin</artifactid>
  <executions>
    <execution>
      <configuration>
        <defaultoptions>
          <extraargs>
            <extraarg>-b,${basedir}/src/main/resources/bindings.xjb</extraarg>
          </extraargs>
        </defaultoptions>
      </configuration>
    </execution>
  </executions>
</plugin>



One thing that made us wonder was why there was a limit in the first place, and why the default value was 256. We were not able to find any answers for this, but the JAXB spec itself lists the default value to be 256. I read somewhere on the Internet that this was because having a Java enum with 256 entries is unmanageable and unmaintainable. But we felt that even having 100 - 200 entries should be unmanageable - in that case, why is not the default value somewhere between 100 and 200? Why specifically 256?