User:Clutch/mod wiki

From Wikipedia, the free encyclopedia

Introduction

mod_wiki will be an Apache module which takes URL requests and supplies the matching Wiki pages. mod_wiki will be written in C. All Wiki pages will be transformed to html "on the fly". mod_wiki will be a generic Wiki implementation, behaving by default like the Wikipedia, but configurable for a wide array of uses.

The opportunity will be taken to put things like SQL statements and HTML templates in text configuration files where they belong. Right now they are hardwired into the PHP code, which makes modification and configuration harder than necessary. Everything possible will be templatized and stored in the database for easier customizability via the web.

The switch to PostgreSQL

All Wiki data will be stored in a PostgreSQL or other SQL92 compliant database, and accessed via a persistent connection between mod_wiki and the database. Use will be made of constraints, prepared queries, and subqueries to make the most common case, viewing an article, doable with a minimum of SQL queries. The current codebase requires twelve separate queries.

The VACUUM ANALYZE command will be run every day so that the database will have the statistics it needs to optimize itself to fit the usage patterns it actually sees.

All data in binary namespaces will be stored in base64 encoded format. If we don't do this, we would have to split binary objects into a separate table from the table holding text objects.

Postgres automatically compresses text entries on the backend for storage, which should provide big space savings and make data access quicker, because less data will need to be read from the disk.

Integration of Other Wikis

Other Wikis will be supported as external Wikis; however multiple Wikis in the same database will be supported via the namespace and language mechanism. Other language wikis will be able to run on the main server, or split off and run on their own. Mechanism will be provided for importing a wiki.

One little potential alteration

Currently there are links on the Recent Changes and Watch List pages to let you view articles within the past X number of days, or to only view the last N number of edits. Instead, we should put in textboxes where they can type in the exact values they want, with 0 or -1 meaning "infinity".

Deletions

Deleted articles are moved from the current and previous tables to the deleted table. Their entry in the titles table is left intact.

Redirects

When an article is redirected, it's entry in the current table will be moved to the previous table and not replaced; the article id, and the id of the article it is being redirected to will be placed in the redirects table. Only a single level of redirects will be supported; if you try to redirect to a page that is a redirect, you will get an error that tells you which page to redirect to. An article redirecting to itself will be left as a normal article and not redirected at all.

Permissions

The essence of Wiki is that no page has an "owner". However, some permissions scheme is needed for groups involved in politically sensitive projects, or in some sort of commerce where some pages should be private. The following is an attempt to provide such a scheme without losing the spirit of Wiki.

Each individual user has no permission by himself; all user permissions are gained through membership in groups. You could create a group with a single user in it, to have the effect of giving permissions to individual users, but this is not the intent and should be done rarely.

There are permission "sets" that let you set the "page" permissions for a particular page in relation to a group of users at the level of a namespace and at the level of an individual page.

A group can have permissions with respect to altering a page, and a group can have permissions with respect to altering a group.

The permission of an individual to perform an action on a page depends on which groups he is in. A user can be in any number of groups. The permission of groups is additive; as long as one of the groups he is in has permission to do an action to a page, he has permission to do that action to the page.

  • read the page
  • edit the page
  • create a page
  • delete a page
  • rename a page
  • redirect a page
  • see that a page has been edited on recent changes
  • find the page when doing a search
  • block an ip (with expiry date)
  • block a user account (with expiry date)
  • create a user account
  • delete a user account
  • create a group
  • delete a group
  • add a user to a group
  • remove a user from a group
  • alter a pages policy with regard to a particular group
  • alter a pages policy with regard to logged in users
  • alter a pages policy with regard to anonymous IPs

I'd also like to be able to password protect a page in the following ways:

  • password protect a page
  • password protect a namespace
  • make the password only valid for N number of times
  • make the login valid for one page view or action
  • make the login valid for a session
  • make the login valid for all time
  • alter a pages password

No clear-text protection and permission scheme is secure against Carnivore, or the NSA logging of 3 years worth of all packets going over the worlds routers. However, lower level government employees, municipalities, and assorted cranks still generally don't have access to those resources, so these permissions should still be useful.

User identification and IP addresses

The user id 0 (zero) stands for "no user" or "not logged in, anonymous IP". For each article revision, besides storing the appropriate user id, the originating ip of the edit will be stored in text format. Right now we are using IPv4 everywhere, but IPv6 may be used in the future. So, all IP addresses will be stored in IPv4-mapped IPv6 format. For instance, the IP 127.0.0.1 will be stored as the string ::FFFF:127.0.0.1. At some later time, when IPv6 becomes the most commonly used format, we can change to a 128 bit binary column to store the IP addresses for greater efficiency. Right now, storing the IP's as text is vastly more efficient space-wise.

Doing things this way makes it trivial to find out what IP's a user has made edits from; if someone says their account was hijacked, this makes it easier to help them correct it; correspondingly it also makes it easier to block a noxious user entirely.

Even if we do switch to using IPv6 in the future, the text representation may still be more space efficient, given that Postgres silently compresses text fields in the backend.

Blocking users and IP's

Sometimes certain users need to be restricted from logging in and from making any edits. So you can specify a user to block from logging in, or an IP to block from editing. The expiry is a date on which the ban can be ignored. The expiry field is in the same format as the timestamp field.

Cookies

Right now a cookie is set so that a user can stay logged in to the Wikipedia across web browser sessions. This will continue. In addition, a new cookie will be set that records a users preferences in how the HTML is generated. This will save an SQL query for each page view performed.

Database Schema

The tables listed here aren't sacred; if some aren't needed, delete them, if others are, insert them.

  • ns stands for "namespace".
  • lc stands for "language code"
  • minor stands for "minor changes"
  • origin is an IP number
CREATE TABLE current (
  ns integer,
  lc integer,
  id integer,
  user integer,
  size integer,
  talk bool,
  minor bool,
  timestamp bigint,
  data text,
  origin text,
  comment text,
);
CREATE TABLE previous (
  ns integer,
  lc integer,
  id integer,
  user integer,
  size integer,
  talk bool,
  minor bool,
  timestamp bigint,
  data text,
  origin text,
  comment text
);
CREATE TABLE deleted (
  ns integer,
  lc integer,
  id integer,
  user integer,
  size integer,
  talk bool,
  minor bool
  timestamp bigint,
  data text,
  origin text,
  comment text
);
CREATE TABLE titles (
  ns integer,
  lc integer,
  id integer,
  title text,
  perms integer
);
CREATE TABLE redirects (
  to integer,
  from integer
);
CREATE TABLE links (
  to integer,
  from integer
);
CREATE TABLE users (
  id integer,
  name text,
  nickname text,
  password text,
  newpassword text,
  login_cookie text,
  options_cookie text,
  last_viewed_talk text
);

The last_viewed_talk field is the timestamp of when the user last viewed their personal talk page; it is used in creating regular pages to determine whether to highlight the appropriate link to let them know they have a new message.

CREATE TABLE groups (
  id integer,
  group text
);

A formalized group of users should have a descriptive name for ease of administration.

CREATE TABLE groupperms (
  group integer,
  perms integer
);

In the absence of a permission specific to the group for a particular page, a groups default permissions for the entire wiki can be set here. Also, in this table alone, the other, non-page-editing related permissions may be set for a group.

CREATE TABLE membership (
  group integer,
  user integer
);

This table tells which users are in which groups.

CREATE TABLE bannings (
  user integer,
  ip text,
  start_date text,
  expiry_date text,
  sysop integer,
  comment text
);

If the user field is set to zero, the ip field represents an IP number to be blocked. For any other value of the user field, the ip field is ignored.

CREATE TABLE watchlist (
  user integer,
  page integer
);
CREATE TABLE namespaces (
  ns integer,
  namespace text,
  urlprefix text
);

If urlprefix is NULL or empty, then the namespace is internal; otherwise the namespace actually refers to an external Wiki. I conflate external wikis with namespaces because it is simple, convenient, and because the current syntax of Wikipedia links demands that external Wikis never be confused with internal namespaces.

CREATE TABLE languages (
  lc integer,
  language text
);

Simple languages table; the language field should be set to a format that is understood by web browsers, such as en_US.

CREATE TABLE nsperms (
  ns integer,
  group integer,
  perms integer
);

Here you can set the default permissions of a group for editing any page in a particular namespace.

CREATE TABLE pageperms (
  page integer,
  lang integer,
  group integer,
  perms integer
);

In this table you can set the permission of a group for editing a particular page in a particular language.