Wire-Swig
Wire-Swig provides libwire binding for some languages. libwire is a part of WIRE, an implementation of Web crawler. The software forcuses on retrieve information from WIRE's storage and index.
Download
- wire-swig-0.20.tar.gz
- MD5: 9d9ba0fbb9b2493bdd2e42b35ff0656e
- SHA1: 7f0aff591374a133157c8dfb54293d9a12189242
- wire-swig-0.10.tar.gz
- MD5: 6bcef8aa045b4de2cb1b591e01bbd23d
- SHA1: 59d9279d8123f5951f764fbd2103b075de3f1332
Requirements
- WIRE http://www.cwr.cl/projects/WIRE/
- tested under WIRE 0.11
- SWIG http://www.swig.org/
- tested under SWIG 1.3.24
and... Ruby, Python, Perl, or other language supported by SWIG. The software is tested on Ruby 1.8.2 and Python 2.3.5.
Example
Ruby
# example.rb require 'Wire' ENV['WIRE_CONF'] = '/path/to/wire.conf' idxdir = '/path/to/wire/index' idx = Wire::Index.new(idxdir) 1.upto(idx.count_doc) do |i| d = idx.doc_retrieve(i) next unless d.mime_type == Wire::MIME_TEXT_HTML print idx.url_by_docid(i) print idx.retrieve_text_by_docid(i) end
Python
# example.py
import Wire
import os
os.environ['WIRE_CONF'] = '/path/to/wire.conf'
idxdir = '/path/to/wire/index'
idx = Wire::Index(idxdir)
for i in range(1, idx.count_doc()):
d = idx.doc_retrieve(i)
if d.mime_type == Wire.MIME_TEXT_HTML:
print idx.url_by_docid(i)
print idx.retrieve_text_by_docid(i)
License
Copyright 2006 NOKUBI Takatsugu <knok@daionet.gr.jp>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
ToDo?
- charset enum support
- read base dir from config
