What about simplicity? The XML 1.0 grammar has 89 productions. Most of the XML I see could get by with a fifth as many (exactly 21). We accomplish this marvelous feat by overriding two productions. Thus, a 2% change eliminates 80% of XML’s complexity with near zero loss in expressivity. That is for the documents I usually see, YMMV, BOCTAOE, AOQA (and other qualifying acronyms). I call it LessML, and the best thing about LessML is that it’s still XML: every LessML document is an XML document.
By overriding the right productions we get to keep the baby and toss out the bath-water:
[1] document ::= S? element S? [43] content ::= (CharData | element | Reference)*
That’s all there is to it. With that small change, 68 of XML’s 89 productions become unreachable. The remaining reachable ones are:
[3] S ::= (#x20 | #x9 | #xD | #xA)+ [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [10] AttValue ::= '"' ([^<&"] Reference)* '"' | "'" ([^<&'] | Reference)* "'" [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) [25] Eq ::= S? '=' S? [39] element ::= EmptyElemTag | STag content ETag [40] STag ::= '<' Name (S Attribute)* S? '>' [41] Attribute ::= Name Eq AttValue [42] ETag ::= '</' Name S? '>' [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [67] Reference ::= EntityRef | CharRef [68] EntityRef ::= '&' Name ';' [69] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' [84] Letter ::= BaseChar | Ideographic [85] BaseChar ::= ... a whole bunch [86] Ideographic ::= ... a whole bunch [87] CombiningChar ::= ... a whole bunch [89] Extender ::= ... a whole bunch
Personally, I’d be willing to loose even more. I’d start by simplifying names and references.
Commentary