Thoughts on software: programming

Showing posts with label programming. Show all posts

Thursday, June 08, 2017

[CSS] How are conflicting styles resolved?

If you have worked in CSS, then you’ll know that you can assign a CSS property using the syntax:

property-name: value

For example, if you have a <span> tag with ID ‘content’, for which you want to assign the color green, you’d add this in your CSS file:

#content { color: green; }

There are other ways you can specify the same property:

span#content {color: green;}, and

.content {color: green;} in combination with <span class="content">lorem ipsum</span>

Here's the interview question

What happens though when you have multiple instances of the same property being set & they all apply to the same HTML tag too? Here’s an example:

Consider this tag,

<span id="content" style="color: blue;">some content</span>

while the CSS definition in the associated CSS file that can match the element is:

#content {color: green;}

Since multiple styles match, which one will the browser render? Answer: The text in the span element will be rendered in blue.

Why? Why did the browser decide to apply blue? As per the CSS spec, there are two aspects to be considered when deciding which style a browser will apply among competing styles. Resolving these two aspects tells the browser which competing style should win. They are: 1) Cascading order, & 2) Specificity. We'll first look at Cascading order and later in the post, Specificity.

Cascading order

In English, the term "cascade" is used to describe a process where there are multiple steps. For example, a cascading waterfall is one in which water flows down multiple steps.

If that is the case, what does "Cascading Style Sheets" mean? What steps are there in CSS? It turns out there are multiple ways through which style definitions for a web page can be assigned. They are: author, user & user agent.

Author styles are those which all software developers know - they are created by the authors of the web page as CSS files or style attributes in HTML tags.
User styles are those styles which users of web browsers can configure on their browser. For example, users can configure that browsers render particular fonts by replacing it with other fonts - this is particular useful from an accessibility standpoint.
User agent styles are those styles that are provided by default by the browser. For example, if no colour information is provided, then text is rendered black on a white background by default - this is an example of user agent styling.

The "cascade" in Cascading Style Sheets flows thus: If there are conflicts in property definitions across user, author or user agent style definitions, then the precedence is as follows:

Author > User > User agent

Example 1

In this example, we’re going to determine what happens if a user CSS file has a definition that conflicts with a definition in the user agent's default CSS file. The user agent we’re going to use is Internet Explorer. It already has a user agent CSS file (this is why a plain HTML file without any styling will render black text on a white background.) We will now change the way IE renders text color inside tags by default by providing a user CSS file.
Create a file by name, my_style.css. The content of this file is just this one line:

div{color:red}

We will now tell IE to use this file from now on for all web pages. The way to do so is this:

Open Internet Explorer
Click on the Tools menu & choose Internet Options
Click on the General tab & choose Accessibility. You should get a screen like this:
Under the User style sheet section, enable the Format documents using my style sheet checkbox.
Now click Browse… under the same checkbox and choose my_style.css.
Restart Internet Explorer.

We now need to create an HTML file that we can load into the browser to test that IE uses the my_style.css. Create a file by name, test_my_style.html. The content of this file is:

<html>
<head>
<title>Testing user styles</title>
</head>
<body>
<div>This is a test file to test user styles.</div>
</body>
</html>

Opening this file in Internet Explorer gives us this output:

What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. IE followed the CSS spec which states that User CSS property definitions have priority over user agent CSS property definitions and rendered the text in red color.

Example 2

What happens if we introduce a further conflict by having an author-defined CSS file? For this, we will create another CSS file, author_style.css, where we will provide the following definition:

div {color:blue}

We will also change test_my_style.html to include author_style.css as follows.

<html>
<head>
<title>Testing user styles</title>
<link href="author_style.css" rel="stylesheet"></link>
</head>
<body>
<div>This is a test file to test user styles.</div>
</body>
</html>

Opening this file in Internet Explorer gives us this output:

What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. The author’s CSS file, author_style.css, overrode that even further setting up another conflict. IE followed the CSS spec which states that Author CSS property definitions have priority over all other CSS property definitions and rendered the text in blue color.

An exception

The only exception to the cascade order above is if the property definition is marked as !important, in which case that definitions take precedence over other definitions for that property. There are no property definitions marked !important in the user agent CSS file.

Let’s look at an example: We will reuse the same files as before, but we will change my_style.css to this:

div{color:red !important}

Now if we open our test_my_style.html in IE, we get this output:

What happened here? The user agent, by default, will render text inside tags as black-colored text. Our user file, my_style.css, overrode that, thus creating a conflict. The author’s CSS file, author_style.css, overrode that even further setting up another conflict. However, IE noticed the !important in my_style.css and followed the CSS spec which states that User CSS property definitions with !important have priority over all other CSS property definitions and rendered the text in red color.

Specificity

The approach mentioned above will still cause conflicts since one of the user/author stylesheets can have conflicting style definitions. To resolve this, CSS provides another mechanism which browsers can use - specificity. While there isn’t a definition of specificity in the spec, my definition is: Specificity determines how specific the style definition is. Here, specific means how many HTML elements does the CSS selector match - the less elements it matches, the more specific it is, the more elements it matches, the less specific it is.

Calculation of specificity

The calculation of specificity is done in the following manner:

Assume there are four numbers separated by commas, and their initial values are zero:

0,0,0,0

The first number represents the presence of a style attribute in the element's HTML. If a style attribute is present, then the first number becomes 1, otherwise 0.

The second number represents the number of id attributes in the selector.

The third number represents the number of attributes and pseudo-classes in the selector.

The fourth number represents the number of element names and pseudo-elements in the selector.

Unlike in the decimal system, if a number reaches the value 10, then it does not carry over to the preceding number. Thus, specificity values like 0,10,0,9 are perfectly valid.

Now that we know what specificity is, let’s take a look at some example CSS definitions, and try to understand what specificity value they evaluate to:

Example 1: div.content {color:red}. It is not a style attribute in a HTML tag, nor does it have any HTML IDs mentioned in the selector. Thus the first two numbers are 0,0. It has a class attribute value mentioned(.content), and it also has a HTML element mentioned (div). Thus, the final two values of the specificity are 1,1. Hence it's final specificity value is 0,0,1,1.

Example 2: #content::first-letter. It is not a style attribute in a HTML tag, but it has a HTML ID mentioned in the selector. Thus the first two numbers are 0,1. It has a pseudo-element mentioned(::first-letter), and it doesn't have any HTML elements mentioned. Thus, the final two values of the specificity are 0,1. Hence it’s specificity value is 0,1,0,1.

Example 3: div[data-name=Tom][data-url=/member/1]. It is not a style attribute in a HTML tag, nor does it have any HTML IDs mentioned in the selector. Thus the first two numbers are 0,0. It has two attributes mentioned(data-name & data-url), and it has 1 HTML element mentioned (div). Thus, the final two values of the specificity are 2,1. Hence it’s specificity value is 0,0,2,1.

Resolving conflicts with specificity

Given two specificity values, you can compare them to find out which one is greater or lesser. A specificity value is greater than another specificity value if the first specificity’s first number is greater than the second specificity’s first number. In case the first number of both values are the same, then the browser moves on to compare the second number of both specificity values, and so on.

Here are some examples:

1,0,0,0 is greater than 0,10,0,0

0,10,0,0 is greater than 0,0,20,0

How is specificity helpful in resolving conflicts? As per the CSS spec, browsers are supposed to resolve conflicts by choosing those CSS definitions that have a higher specificity.

Example

Let’s take the example in the interview question above:

In the HTML, we have:

<span id="content" style="color: blue;">some content</span>

while in the CSS file, we have:

#content {color: green;}

Constructing the specificity for the style definition in the HTML style attribute, we get:

1,0,0,0

Constructing the specificity for the CSS style definition, we get:

0,1,0,0

Because the first specificity value is greater than the second, the style definition in the style attribute of the HTML tag wins.

Is it possible to still have conflicts?

Yes. For example, there could be two definitions in an author CSS file which target the same elements and have the same specificity. In such cases, the CSS spec says that browsers can use the definition that appears later.

An example:

Let’s say that we have two CSS definitions as below:

div {color:blue};

div {color:red};

for this HTML,

<html>
<head>
<title>Testing user styles</title>
</head>
<body>
<div>This is a test file to test user styles.</div>
</body>
</html>

Both CSS definitions evaluate to a value of 0,0,0,1.

In this case, the browser will simply render the text in red.

Friday, October 07, 2016

Certifications - "Do they benefit me?" is the more important question

There's a lot of controversy regarding certifications.

Some people think certifications have no value. Much of this seems to stem from the possibility that a person having a certification may not actually have imbibed the knowledge for him to be effective and for his employers to reap the benefits. Some others think that like all exams, it's easy to cheat and get the certification. Some others think that the fact that some certifications have limited validity means their value to employers expires within a certain period of time. That is, unless the person demonstrates a constantly-learning attitude even without certifications coming into play, having a certification won’t help.

In my opinion, while all of these reasons may be correct, one shouldn’t ignore certifications. Here are some reasons why:

To folks outside the industry, certifications provide a proof that you have skills and those skills have been validated by a standards authority. To explain that sentence better, I quote here something I learnt on the Internet: anyone can drive vehicles without a license, but when you want to hire a driver, you’ll hire one with a license. When looking for a driver, you’ll avoid looking for a driver without a license because you do not want to add to the problems you already have on your plate (like getting into accidents). The only way to achieve this is to look for someone who has had his skills validated by a standards authority (here, the government licensing authority).
Another reason for certifications is that they are a great way to have a deep understanding of the technology involved. If you’re like most developers, then you probably have worked in a lot of technologies over the years. After doing so for a few years, some developers decide that it’s better to focus on a technology and become an expert at it, rather than jump from one technology to the other and skim only the basics of each technology. Once such a decision is made, the best way to achieve it within a reasonable timeframe is to get a certification in the technology. To get the certification, one will have to look out for some courses/books related to the certification. These courses/books teach basic & advanced concepts and also have mock exams where you can test your skills before taking the actual exam. Taking these mock exams (with all honesty & seriousness) helps in building your understanding of the technology, leading to better career opportunities.
A lot of folks will say, “This isn’t really different from the usual advice that developers must read books". True, but taking a certification really makes you understand a technology, since you have to pass the exam (or atleast the mock tests), rather than just reading a book & potentially forgetting the concepts later.
If you’re totally new to the software field, having a certification helps to get a foot into the door. It demonstrates that you took extra effort to understand something, and that you have some basic knowledge. Keep in mind that that is all a certification can do - if you can’t code even though you have a certification, then you’ll not have a chance.

I want to expand on that last point. As Jeff Atwood says, software is a field where you can expect to work on multiple technologies & frameworks inside that technology. You’re often expected to demonstrate that you can do great work using a technology you may only a fair knowledge of. This means there’s going to be a frustrating period in which you ramp up on the technology only to be moved to a new technology later. It also means you’re going to get co-workers who are new to the technology, and are ramping up really slowly, making you wish they had read up on the basics before joining your team. In both cases, certifications come to your rescue - they guarantee that there is some basic knowledge that you (or your co-worker) have.

So to summarize, do not wish away certifications just because someone said so. Also, do not do a certification just because I said so. Think about the benefits that you get out of the certification. Think about the time invested in the certification and whether it is time that would return more value if invested elsewhere. Some certifications are valuable only for a certain time period; in that case, are you ok with your certification losing value some years down the line, or do you think you can keep updating the certification as the years go by? Think through the pros & cons from your angle (not from mine or someone else's), and take the decision that best fits you.

Tuesday, November 17, 2009

About integer overflows...

One fine day, when I was browsing StackOverflow as usual, I came across this particular question. The gist of the question was this:

A person in an interview asked me how I would generate all possible integer values in Java.

Some people, including me, thought it was a simple question. All you have to do is this, right?

  for (int i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
    System.out.println(i);
  }

(I confess I did not readily think of using the constants Integer.MIN_VALUE and Integer.MAX_VALUE. In my mind, the first program that hit me used the actual values of Integer.MIN_VALUE and Integer.MAX_VALUE. But reading through some of the comments to the question, I quickly realized I could use them.)

Simple right?

Wrong, as I found out later.

It turns out the program will run fine until the variable, i, equals Integer.MAX_VALUE. At that point, the System.out.println() will print the value of i which is Integer.MAX_VALUE and then control exits the current iteration of the for loop. Now i is incremented. What will the value of i now be? I expected it to be Integer.MAX_VALUE + 1. But no, its Integer.MIN_VALUE!! Since Integer.MIN_VALUE is less than Integer.MAX_VALUE, the conditional expression passes.

The outcome, sweetie, is that it's an infinite for loop.

You can test this out for yourself by executing the above program. but wait.. surely you are not going to wait for the program to run through all values from Integer.MIN_VALUE till Integer.MAX_VALUE? That's 4294967295 numbers!! When will you program finish executing? Instead you can try out this simple program:

int i = Integer.MAX_VALUE;
System.out.println((i + 1) == Integer.MIN_VALUE);

Executing it would print "true".

So to answer the interview question, the correct code to print all values that an int can store is:

for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
  System.out.println(i); 
}
System.out.println(Integer.MAX_VALUE);

The bits deep under it all...

"But, but", I thought, "how can incrementing a variable that holds Integer.MAX_VALUE result in Integer.MIN_VALUE?"

To understand why this occurs, we must dive deep into the world of bits and bytes. Java defines an int data type as something that can hold upto 32 bits. Thus, the lowest value that it can hold is given by Integer.MIN_VALUE, which is actually -2147483648, which in binary is represented as 1000 0000 0000 0000 0000 0000 0000 0000. (You don't have to struggle a lot to get the binary representation of a number. Java provides you the Integer.toBinaryString(int) method. Another way to get the binary representation is to open Calculator in Windows, type in the number and click on 'Bin' radio button to get the binary representation.)

In case you have forgotten, in Java, of the 32 bits for an int, the left most bit (aka the higher order bit) indicates the sign, with 0 being positive and 1 being negative. That is why in the previous value, the higher order bit is 1. So basically Java uses only 31 bits to store the number.
"If that is the case", you ask, "how come the other digits are all 0? If all digits are 0 in binary, then shouldn't the value be 0 in decimal rather than -214 whatever? Are you doing something wrong?" Nope. Java stores values in two's complement form. In case this two's complement thing is new to you, please read all about it here before going ahead with this blog post.

Similarly, the highest value an int can hold is 2147483647, which in binary is 0111 1111 1111 1111 1111 1111 1111 1111. Notice something about the higher order bit?

Now that we have established the basics, let's go back to our original question - when you increment Integer.MAX_VALUE, why does it revert back to Integer.MIN_VALUE? Let's do the arithmetic and see what we get...


 

  Addend:
  0   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
 

  Addend:
  1
 

  

 

  1   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0

The result of adding 1 to Integer.MAX_VALUE is 1000 0000 0000 0000 0000 0000 0000 0000, which is the value we previously got for Integer.MIN_VALUE!!.

The same reason is why we get Integer.MAX_VALUE when we decrement Integer.MIN_VALUE.


 

  Minuend:
  1   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
  0   0   0   0
 

  Subtrahend:
  1
 

  

 

  0   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1
  1   1   1   1

This is the case with other languages that have the concept of datatypes. Let's take up C, which has datatypes just like Java. In C, the maximum value of an int is given by INT_MAX, defined in limits.h. Here's a program below that stores the value of INT_MAX in a variable, and tries to increment that value.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main() {
  int max = INT_MAX; 
  printf("Max int value: %d\n", max);
  max++;
  printf("Now value is %d\n", max);
  printf("Min int value: %d\n", INT_MIN);
  printf("Are they both equal?: %d\n", (max == INT_MIN));
  return 0;
}

When you execute this program, you would find that initially max holds the value 2147483647. However, when you do max++, the value of max immediately becomes -2147483648.

Thus, this 'problem' exists in C too!! In fact, I would go so far to say that this problem exists in most programming languages. This problem exists because we have reached the limit of what the language can store for that data type. To store numbers beyond that limit, the language will have to recognize bits beyond that limit as bits that represent the numberical value, which it doesn't do. As an example, consider the case where we increment Integer.MAX_VALUE by 1 in the Java code seen above. Integer.MAX_VALUE already has 31 '1' bits and its 32nd bit (the higher order bit) is 0. When we increment, the 31 '1' bits become '0' bits and the 32nd bit becomes '1'. But Java and C do not recognize the new value of the higher order bit as one that represents the new incremented value. Instead, they use it for the sign - thus taking only the 31 '0' bits as the binary representation of the incremented value, which is Integer.MIN_VALUE!!

What if we want the language to recognize those higher bits as part of the numeric value representation? We would have to use datatypes that store numbers using more number of bits. An example is long, which stores values using 64 bits. Using long, the solution we have posted above becomes:

for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
  System.out.println(i);
}

Since a long is 64 bits, integer values will very easily fit into it, and hence the solution can be provided without any extra printing of Integer.MAX_VALUE. But even in that case, you do have a limit beyond which the value resets to the lowest number that can be stored by the datatype. In case of long, because it stores numbers using 64 bits, the highest value that it can store is 9223372036854775807.

"Ok," you say, "just for the purposes of making our understanding concrete, let us assume that a language has a 32-bit integer datatype that stores only positive values. In such a case, there is no need for the sign bit - all values of that type are positive anyway. Would it then be possible to increment a variable that holds a value of 2147483647 and still get the right answer?"

Actually, C does have such a datatype - it's called unsigned int. And yes, you can go beyond INT_MAX. Check out this program:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main() {
  unsigned int value = INT_MAX;
  printf("value is: %u\n", value);
  value++;
  printf("incremented value is: %u\n", value);
  return 0;
}

When you increment, the value of value is 2147483648.

So that was what it was all about... when you have a language that supports datatypes, and when you use a datatype, remember that when you reach the limit of the datatype and increment the value, it will result in the lowest value that can be stored by the datatype. Depending upon your work, you might never face this situation in the real-world - it might come up only in interview questions - but still, it pays to learn this and keep this in a corner of your mind. I have been referring to this phenomeonon by using the word, 'problem' all this while, but it actually does have a name - Integer overflow.

P.S.: I am not sure if there are any strongly typed languages that handle this overflow by themselves instead of leaving it to the developer like Java and C/C++ do. If you do know of any, please do mention them in the comments!!

Thoughts on software