WikiPedia Search Engine

Step 1 : Parsing the Data

To parse the data, run the file "creating_index_phase2.py" To run the file, the syntax is "python3 creating_index_phase2.py " The aim is to parse the entire dump and create split files of index that are individually sorted. This function also creates docToTitle map that will map document ID and their title. Then, merge split files of index and create a meta data of index The aim is to create final index by parsing all the split index files. This will create multiple final index files that are lexicographically sorted. This function will also create a metadata file of index that will have first first and the index filenumber of each final index file. All the files will be stored in the same folder

Step 3 : Search

SYNTAX: python3 search.py

This program loads docToTitle file along with final index metadata. The search results will have top 10 results. Two types of queries are processed:

Normal Queries: single word queries, phrase queries
Field Queries: This type of queries will have field: "field": type of input syntax where field can have any of title, body, ref, infobox, category, link as its value

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
creating_index_phase2.py		creating_index_phase2.py
inverted index.png		inverted index.png
queryfile		queryfile
readme.md		readme.md
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiPedia Search Engine

Step 1 : Parsing the Data

Step 3 : Search

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WikiPedia Search Engine

Step 1 : Parsing the Data

Step 3 : Search

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages