Thursday, January 24, 2013

JAXB - Generating an <simpleType> with more than 256 <enumeration>s


So this was a strange error that we faced a few weeks ago.

The client we work for has various teams with each exposing their functionality to other teams via web services. So, in effect, a web application can be built, with it talking to various web services to get work done. Our work that day was to make a new web service. This was similar to another web service, with certain differences in inputs and functionality between the two. For various reasons, we decided to create a copy of the first web service's WSDL file and make the changes in inputs to the second WSDL.

While we were doing so, we found that the previous WSDL has a field for accepting the country code, but its data type was marked as string. We felt that this could lead to wrong country codes in the database as people could input any value. Our database also had a master table that stored the country codes. While the web service code did verify the input against the table, we decided to change from string to a simpleType that had restricted elements. This would mean that our clients would never be able to provide invalid values.

Basically, we wanted to change from:

<element name="countryCode" type="string"></element>
to
<element name="countryCode" type="CountryCode"></element>
with CountryCode type being defined thus:
<simpletype name="CountryCode">
  <restriction base="string">
    <enumeration value="IN"></enumeration>
    <enumeration value="US"></enumeration>
  </restriction>
</simpletype>

Since our database has 262 country codes, we decided to list all of them, thus having 262 <enumeration> entries in CountryCode. This wasn't a very big work as we initially thought, thanks to copy-paste and IDEA's column selection feature.

We use Apache's cxf-codegen-plugin in our project to generate the Java classes that do much of the XML-Java conversions. cxf-codegen-plugin ties into Maven's generate-sources phase to generate the Java classes. So when we ran mvn generate-sources, we expected an enum type called CountryCode with 262 fields.

In reality, the class was not generated at all.

I immediately had a suspicion over the number of enum fields, because I had never written or seen a Java enum with that many fields. So we trimmed the simpleType to one entry and ran mvn generate-sources, and the result was that the CountryCode class was generated, with one field. When we brought back the entire list, no class was generated. So we commented out the entire list and slowly uncommented a few entries (from the top) one by one to see at which point the error occurred. The Java file was generated fine all along until we reached the final few entries (about 6 or so). At that point, the Java file was not generated.

Again, the thought of some count limitation entered our heads. We were also entertaining the possibility of some character we pasted being of a different encoding or some whitespace character inadvertently getting into our code because of the copy-paste. To rule out the second possibility, we deleted the <enumeration> entries for the 6 country codes and manually keyed them in ourselves. Still, it did not work. To further rule out this possibility, we commented all entries and then slowly uncommented entries from the bottom up. The Java file was generated until we reached the top few entries, at which point it failed.

So we were back to our count hunch.

We were thinking that maybe WSDL had an issue with so many <enumeration>s. We didn't think it would be so, but we decided to check anyway. The WSDL spec did not mention about any restrictions in number for the <enumeration> tag of <simpleType>. So we felt it had to be an issue with either the cxf-codegen-plugin or Java. Googling revealed that Java had a limit for the number of fields in an enum, and that was 65535. Since we were much below this, we ruled out Java as the problem.

So now the only thing left out was the cxf-codegen-plugin. Googling revealed that it internally made use of JAXB. Further Googling brought up this link which said that you had to add the typesafeEnumMaxMembers attribute to your <globalbindings> tag to enable it to generate more than 256 elements in an enum type. This <globalbindings> tag is present in the bindings.xjb file in our project. We set typesafeEnumMaxMembers to 300 and found that we were able to generate the CountryCode.java file, with it having all 262 enum elements!!

<globalBindings typesafeEnumMaxMembers="300"/>

This was a great relief since we had been Googling for many hours and had become frustrated. Googling further, we learnt more about JAXB and the xjc tool. I was aware that JAXB was a tool that could be used to do the conversions from XML to Java and vice versa, but I had never really dwelt into and learnt more about it. Hence xjc was new to me. In the end, I understood that it was xjc that did the job of generating the Java classes. You could customise the way xjc generates the classes by creating an external bindings file, which had to have the extension '.xjb'.

And that's where the file, 'bindings.xjb' in our project came in. You can inform JAXB about the presence of this binding file by passing the file name to the -b parameter of the xjc command. Since we were using the cxf-codegen-plugin and not using the xjc command directly, we configured these arguments via the <executions> tag of <plugin> tag in pom.xml. Basically, we did this:


<plugin>
  <groupid>org.apache.cxf</groupid>
  <artifactid>cxf-codegen-plugin</artifactid>
  <executions>
    <execution>
      <configuration>
        <defaultoptions>
          <extraargs>
            <extraarg>-b,${basedir}/src/main/resources/bindings.xjb</extraarg>
          </extraargs>
        </defaultoptions>
      </configuration>
    </execution>
  </executions>
</plugin>



One thing that made us wonder was why there was a limit in the first place, and why the default value was 256. We were not able to find any answers for this, but the JAXB spec itself lists the default value to be 256. I read somewhere on the Internet that this was because having a Java enum with 256 entries is unmanageable and unmaintainable. But we felt that even having 100 - 200 entries should be unmanageable - in that case, why is not the default value somewhere between 100 and 200? Why specifically 256?