From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Dear Wikipedians,

We are a group of 3 students currently pursuing our B.E - IT (Bachelor of Engg. Information Technology)from the Mumbai University,India. As of now we are working on a project titled " AUTO EXTRACTION OF CONTENTS FROM THE WORLD WIDE WEB" as a part of our BE project, in the renowned institute os HBCSE-TIFR (Homi Bhabha Center for Science Education - Tata Institute of Fundamental Research)under the guidance of Scientist Dr.Nagarjuna.G. Our project is based on:

 OS	     - GNU/LINUX	
 Language     - Python
 Server       - Zope
 Application  - GNOWSYS

GNOWSYS, Gnowledge Networking and Organizing System, is a web application for developing and maintaining semantic web content developed in Python and works as an installed product in Zope Our project involves automatically extracting data from the (WWW) World Wide Web) & use GNOWSYS for handling this vast amount of data. This will not only help us store data in the Gnowledge base in form of meaningful relationships but also see its handling of huge amount of data. The URL for our site is ""

With this regards we could think no one but Wikipedia, which in itself is a phenomenon. We would be glad if u could answer to few of our queries :

  1. What is the format in which the data is stored in Wikipedia ???
  2. Apart from http or ftp are there any other specific protocols that are in , use which will be required to communicate to the Wikipedia Server ???
  3. How can we utilize the SQL dump ???

We hope you will answer our queries at the earliest

With warm regards
Thanking You

-Rameez Don , Jaymin Darbari, Ulhas Dhuri

  1. The current revisions of the articles, and all the previous revisions, are stored as uncompressed wikitext in a MySQL database. Encouraged media formats are JPEG, PNG and Ogg Vorbis. Mathematical formulas are written in a TeX subset and rendered as PNG. HTML for anonymous users is cached, both compressed (gzip) and uncompressed.
  2. HTTP is the only supported protocol for communication with Wikipedia. Over HTTP, you may download rendered HTML, wikitext (via the edit pages), and database dumps (compressed SQL). There is an XML interface for downloading wikitext in development.
  3. At the very least you need to install MySQL. If you want to view the rendered HTML locally, you need to download the PHP script from, and follow the instructions in the INSTALL file. This will involve installing MySQL, PHP and Apache.

-- Tim Starling 00:21, Oct 8, 2003 (UTC)