data consistency checks for OSM
Interfacing with KeepRight
You may use KeepRight's results in a number of ways:
exporting GPX waypoints
Purpose
Exporting a section of the map into a GPX-styled list of waypoints for use with GPS units
URL format
https://keepright.at/export.php?format=gpx&ch=20,30,311,312&left=-82.39&bottom=30&right=-82.1&top=30.269
You can specify a list of error types you want to have in the file as well as a bounding box on the map. This export will return up to 10000 waypoints.
There is a link on the lower left corner of the map-page pointing to the GPX service that always includes the current error type selection and view from the map.
exporting new errors as RSS feed
Purpose
Watching a section of the map for newly found errors
URL format
https://keepright.at/export.php?format=rss&ch=20,30,311,312&left=-82.39&bottom=30&right=-82.1&top=30.269
The URL format is the same as for GPX exports, just the format parameter is different. The RSS feed will include error entries that were first found within the last three weeks.
There is a link on the lower left corner of the map-page pointing to the RSS service that always includes the current error type selection and view from the map.
getting the whole dump-file
Purpose
Doing something completely different with 25 millions of errors...
URL format
https://keepright.at/keepright_errors.txt.bz2
This tab-separated file contains all errors currently open for the whole planet (currently >500MB). It is being updated daily.
Table layout
- schema
The schema is an identifier naming a region on the planet. According to the planet splitting map the planet is split in rectangular parts to get roughly equally sized dump files. Consider the schema as a prefix for the error_id.
- error_id
A number identifying errors, starting from 1 for each schema. An error_id is worth nothing if you don't know the schema!
- error_type
numeric representation of the type of error. Error types are assigned in blocks of 10s (20, 30, 40...). They correspond with the name of the script file doing the checking.
Error types may be sub-typed (281, 282 etc.). Subtyped error checking routines test for different aspects related to a single topic (in the example 280 means "boundaries", 281 means "missing name[ for boundaries]" and 282 means "missing admin level[ for boundaries]"). Subtyped error types are rendered as groups that may be collapsed on the web site.
- error_name
textual representation (short name) of the type of error. On sub-typed error types you may want to prepend the error_name that belongs to the main number to make the name complete.
- object_type
one of node/way/relation
- object_id
an OSM node_id, way_id or relation_id
- state
one of new, reopened, ignore_temporarily, ignore. You won't see any 'cleared' errors because the dump contains active errors only. Temporarily ignored errors are issues fixed by a user who really hopes to have fixed it. Temporarily ignored errors will jump back in the 'new' state with the next update if the error isn't really fixed. Ignored errors are issues that are simply false positives and should never come back just because KeepRight is wrong and this exception cannot be included in the ruleset.
Errors that were once cleared and come back at some point in time later are put in the reopened state. Please note that this may happen due to runtime-errors in the scripts. So you may just consider new and reopened the same.
- msgid
This is the scaffold for the error description where placeholders ("$i") stand in place of the actual values inserted by the concrete error instance. You may put this scaffold inside a GNU gettext() function to have it translated. GNU gettext requires a .po file that holds original and translated strings. You may use existing .po files from here: de pt_BR
find the GNU gettext template file here: keepright.pot
- txt1 ... txt5
These bits of text are the contents that have to be inserted in the error message after translation. txt1 will replace $1, etc.
- first_occurrence
Timestamp (MESZ) of when this error was found the first time
- last_checked
Timestamp (MESZ) of last time this error was (re-)checked by the scripts
- object_timestamp
Timestamp of the object that was used when checking, as found in the official planet file.
- user_name
User name of the user that last edited the given object.
- lat
- lon
Location on the planet. Coordinates are given in the same projection as found in the official planet file. Please note that numbers are displayed as int values. To convert back to real lon/lat you have to divide by 10^7
- comment
User-comment (if any)
- comment_timestamp
Timestamp (MESZ) of when the comment was given (if any)
loading the errors table
This is the schema definition for use with MySQL databases:
CREATE TABLE IF NOT EXISTS `keepright_errors` (
`schema` varchar(6) NOT NULL default '',
`error_id` int(11) NOT NULL,
`error_type` int(11) NOT NULL,
`error_name` varchar(100) NOT NULL,
`object_type` enum('node','way','relation') NOT NULL,
`object_id` bigint(64) NOT NULL,
`state` enum('new','reopened','ignore_temporarily','ignore') NOT NULL,
`first_occurrence` datetime NOT NULL,
`last_checked` datetime NOT NULL,
`object_timestamp` datetime NOT NULL,
`user_name` text NOT NULL,
`lat` int(11) NOT NULL,
`lon` int(11) NOT NULL,
`comment` text,
`comment_timestamp` datetime,
`msgid` text,
`txt1` text,
`txt2` text,
`txt3` text,
`txt4` text,
`txt5` text
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
mysql --local-infile --password --user=root --execute "LOAD DATA LOCAL INFILE 'keepright_errors.txt' INTO TABLE keepright_errors CHARACTER SET utf8 IGNORE 1 LINES;" osm_EU
Please note that schema is a reserved word in MySQL, so you always have to quote it like this: `schema`
There are two primary keys in this table: a natural one and an artificial one:
The natural primary key consists of error_type, object_type, object_id, lat, lon. That means one type of error may be found on multiple spots belonging to one single object (eg. self-intersections of ways).
The artificial primary key consists of schema and error_id. It is used just for simplicity of referencing individual error instances and it is completely redundant.
querying node counts
As a waste-product the scripts create a file that contains the numer of nodes per square degree found in the planet file. Resolution is 0.1 degrees. You may download the file here: https://keepright.at/nodecount.txt.bz2
This dump file can be useful for statistics if you want to calculate an `errors per node` measure