Thursday, December 24, 2009

Why is i4i's patent important?

i4i's patent which Microsoft has been convicted of infringing in Word 2007, is about a generational leap in the capability of computers to process data.
  • HTML as the first generation. If you wanted the author's name to be seen in bold, you used a descriptive tag thus:  bold text
  • SGML was the next generation: it operated with a special complement of descriptive "verbs" or "tags" and software that could process and interpret these efficiently. The advantage of such an approach was that different stylesheets could depict the same text differently. This promoted re-usability of data. Problem lay in the restricted number of tags.
  • XML broke this limitation of SGML; now, one could create one's own tags; browsers could render it so long as it was "well-formed", i.e. close-tags followed open-tags predictably within the document, and there was no open-tag without a corresponding close-tag, or vice versa. Further condensed and sophisticated logic could be imposed on the document structure through the use of rules of logic and structure embedded in Document Type Definitions (DTD) or Schema. This provided another breakthrough in terms of the range of applications -- no longer did one need standard ERP software in order to exchange data; XML coders and decoders did the job, and organisations could merely exchange xml files representing transactional data independent of database software. Thus markup languages made data interchange possible easily and cheaply. Many other applications were developed that made xml a development of nearly revoltionary proportions.  
  • However, with all these developments, tags (which are commands to the computer) were interspersed with the data. This meant that when reading the data stream, the computer had to first apply logic to gather whether each character it read was part of a data stream or a command. This slowed down the ability of the computer to read and process a document or an object, While this may not be apparent on the scale of data that most of us are used to dealing with, where there are mountains of data to process, this is a serious time-and-efficiency robber. 
  • This is where the elegant concept of i4i's patent comes in. If there is a way in which commands are interpreted independent of the content, the computer can read all the content at one go and process the content by implementing the commands in serial order. In other words, if all the commands in a data object ("file") were to be are found in one place and all content in another, the computer no longer needs to evaluate every character to find if it was part of a command or content.  This affords a huge, generational efficiency leap. For the same computing power, a lot more data can be crunched in much less time. In effect, this could make computing power cheaper by raising the efficiency with which computers process information/ data. 

Patent Infringement Case against Word 2007


On reading the patent No. 5787449 applied for on 2 June, 1994 and granted to i4i on 28 July, 1998, for a "Method and system for manipulating the architecture and the content of a document separately from each other",  and some MSDN literature relating to the "Custom XML" claimed to be a Microsoft invention, I observed the following.

  • The patent dealt with a method of keeping raw, unstructured data separate from its formatting or presentation-related information. This is different from what is understood as XML because an XML file content is structured, and not in raw form.
  • The patent application clearly differentiates the method from earlier standards including TROFF, RTF and SGML by showing that what they are patenting has no codes embedded in the contet, but instead has a content part, and a metacode map part stored separately. One could have multiple metacode maps acting upon the same content.  The content could therefore be literally anything. Thus, for consistent content that rarely changes, multiple re-use of the content using different metacode maps each serving different purposes, become possible.
  • This is uncomfortably close to what Microsoft calls as "Custom XML" on its MSDN library site. Indeed, way back in 2005, one of Microsoft's lead programmers blogged on an MSDN blog about the new "Custom XML" -- and if you read that, it becomes quite clear even to a relative layman that what Microsoft meant was clearly that it would put an "envelope" around any data (it could be a Word document, a spreadsheet or anything else) that would form part of a composite object, consisting of the envelope and any data (in this case, say, a Word file or Excel Spreadsheet) that is placed in what is called the "XML Data Store". The resultant object, which you and I understand as the MS Office 2007 document format, is called the "Office Open XML package". The advantages of this are expounded in the same blog entry. Brian Jones, the lead programmer, admits (gushes, actually) here (in 2005, remember!)  that for Microsoft, it is a new feature.
  • Both, Microsoft's Custom XML and i4I's patented method are not really about XML. Using this method to store structured, formatted, XML content is a subset of what the system can do. It can store binary (or raw) data equally easily as it can store structured text content.
  • Brian Jones' gushing about a new feature when it has been patented for 7 years is no different from the scathing, withering review of Bill Gates' book, Business @ Speed of Thought -- that Gates predicts the past. Much worse, while Gates only becomes an object of intellectual scorn to the reviewer, what Brian Jones and his ilk have done for Microsoft is to drive Microsoft into a legal patent infringement hole -- costing at least $290 million -- and that won't look pretty from inside Microsoft.
  • Brian Jones or others in Microsoft may have re-invented the wheel, but they cannot claim ignorance of the i4i patent, given that Microsoft has probably among the largest legal departments of any company in the world, and every product must be undergoing IPR infringement vetting before going to the market.
To conclude, I think the decision is fair, the concept was clearly patented, and the i4i patent was clearly infringed, albeit under new names of "XML Data Store" and "Custom XML".

Thursday, December 3, 2009

The US was also a major IPR pirate not so long ago!

The US just loves to paint developing countries like China and India in dark colours when it comes to respecting IPR, but here are a couple of articles written during the second half of the 19th century, when England was relatively more prolific in the arts and letters than the United States. It transpires that the United States was not much different from what it alleges that China is today. In other words, the US's own record in this regard has hardly been more impeccable.

In those days, there was no established international copyright code. Therefore, payments of royalties and recognition for foreign authors were not legally enforceable but were based on honesty and "courtesy of the trade".

In 1867, one James Parton wrote, "For forty years or more we have all been buying our books and reviews at thieves' prices... . . Can any one suppose that the proprieters like to see Blackwood and half a dozen other British magazines sold all over the country at a little more than the cost of paper and printing?" He chronicles several instances of authors unable to encash the success of their works, and makes out a cogent case for an International Copyright.

Then, in 1879, Arthur Sedgwick wrote, " ... piracy still flourishes as a profitable branch of trade. ... The attitude of the United States on the subject of copyright is more remarkable than that of any other modern country. ... It has ... studiously fostered international piracy, and refused to foreigners the benefits of its copyright law"

James Fallows, in a more recent article written in Dec 1993, suggests that cheating and cutting corners to get ahead, and then, once strong, advocating set rules of fair play and chiding other powers for failing to abide by them,  was a standard pattern by which developing nations typically bolstered their international economic standing. We can see this pattern very regularly in the big international debates of the day -- be it agricultural subsidies, or climate change initiatives, or IPR.

These writings, both old and relatively recent, represent contemporary and historical evidence that lay bare enough to show that notwithstanding the high moral ground positions adopted by developed nations in multilateral negotiations, were themselves not much different from the targets of their ire only a century-and-a-quarter ago.