|
|
General
The Open Directory Project distributes its data in this directory
in Resource Description Format files
(RDF).
RDF is a graph description language implemented in eXtensible
Markup Language or XML.
Errata
Unfortunately the format of the Open Directory Project is not optimal.
In particular it is much longer than it would need to be, and is not even
legal XML.
Because of this, the format will need to change from time to time. We
will note the changes in this file until we have a better method for notifying
users of the changes.
Changes
- 2000-11-20
-
A few additions have been made to the RDF files. There is a tag for altlang
categories which behaves exactly like the symbolic tags. Tags have also been
added for mediadate and ages (for Kids_and_Teens) with regards to URLs if these
qualities exist for each URL.
- 1999-12-09
- The RDF files are going UTF-8. You may check out an advance copy
of this new format at http://rdf.dmoz.org/rdf/World.rdf.u8.gz.
I hope that this will clear up a lot of the problems that some users
have been having with the format. If you notice any problems, please
send mail to truel@dmoz.org.
We will continue to generate the current RDF files until at least
January 8, 2000. We will be generating UTF-8 files periodically until
that date. After January 9, all rdf files will be in the UTF-8
character set.
N.B. Some languages may have some incorrect characters. More
precisely some of our categories do not have a character set
associated with them yet, and so I am converting them to UTF-8 as
though they were encoded in ISO-8859-1. Please do not send me
email if you think you know what character set a given language
should be in, but only if you know what character set the given
ODP category is in.
-
- 1999-08-25
- I have created an eGroups.com mailing list to announce changes to
the rdf format. To sign up, fill your email address in the following
form:
- 1999-08-24
- Now provide redirect.rdf.gz which lists
categories which have been moved and where they have been moved to.
This should obviate your need for the catmv.log.gz file.
Redirections here are pre-chained. That is if a category has moved
many places, the redirection listed is the first one that actually
hits a category. If someone moves a directory around and
someone else creates a directory at one of the intermediate locations,
the newcomer is the redirection listed.
- 1999-07-29
- Character escaping is being done inside all fields now,
not just in Titles and Descriptions. The following four characters
are being quoted, so you will have to unquote them when converting to
html:
| & | & |
| < | < |
| > | > |
| " | " |
High byte characters and non-printing control characters are also
being quoted now. I have decided against utilizing actual character
quoting (ie. ae;) since supporting full unicode is beyond the
capabilities of some of our customers. Instead the hex value of the
these characters will be presented, and if you wish to convert to
unicode, you will have to keep track of the charset for the given
category.
As an expamle, the byte value of 200 will be presented as
&xC8; whether that character was from the 8859-1 character
set (C8; or È or &#C8;) or from 8859-2 (Č or
Č) or from any other character set.
- 1999-05-18
-
Symbolic links that have been separated from the rest of the subcategories
now have the link type "<symbolic1 ...>". This is exactly analogous
to "<narrow1 ...>" (for separated subcategories).
- 2002-07-23
-
Data in the Netscape/ tree is no longer included in the main RDF dump.
Instead, it is provided in these files:
netscape-content.rdf
netscape-structure.rdf
netscape-terms.rdf
|