L[0]
, L[1]
, L[2]
, etc.D[key]
returns the value mapped to key
.players[i]
).teams = {
"Yankees": ["Player A", "Player B"],
"Red Sox": ["Player C", "Player D"]
}
The general approach to using dictionaries when solving a problem:
What do I know? (keys)
What do I want to know? (values)
Example: If you have ages and want to find who has that age, the dictionary would be structured as:
ages = {
20: ["Alice", "Bob"],
25: ["Charlie"]
}
Lists as values: Since dictionary keys must be unique, we use lists when multiple values are needed.
Web Indexing Problem Why Build an Index?
The goal of an index is to quickly find which sites contain specific words.
What do we know? → The words we care about.
What do we want to know? → The sites that contain those words.
Dictionary Structure for Indexing
The keys should be words.
The values should be a list of websites that contain those words.
index = {
"python": ["site1.com", "site2.com"],
"coding": ["site3.com"]
}
Textbook Analogy
Think of a textbook index: it helps locate which pages contain specific words.
Similarly, a web index helps locate which websites contain certain words.
Implementation Details Function: index_site
Arguments:
site: A string (URL of the site).
text: A single string containing all the text from the site.
This function indexes only one site at a time (like adding a new page to a textbook).
Use .split() to extract words from text.
Convert text to lowercase to avoid duplicate entries due to case differences.
Do not print or request input inside this function!
Function: search_single_word
Should return a list of sites that contain the given word.
If no sites are found, return an empty list, not a printed message.
Key Considerations
Avoid duplicate sites in results.
Ensure case insensitivity by lowercasing the text.
Test your search functionality and refine details.
This project enhances a basic web indexer by tracking word frequencies and improving search efficiency.
Instead of storing words as keys mapping to a list of sites, we modify the structure to store word → {site: frequency} mappings:
index = {
"mit": {"web.mit.edu": 3, "csail.mit.edu": 2},
"google": {"web.mit.edu": 1, "google.com": 3}
}
Step 1: Implementing index_site
This function processes the text of a given site and updates the index accordingly.
import re
def index_site(index, site, content):
words = re.findall(r'\w+', content.lower()) # Extract words and convert to lowercase
for word in words:
if word not in index:
index[word] = {site: 1} # Case 1: New word
else:
if site in index[word]:
index[word][site] += 1 # Case 3: Word exists, site exists → Increment count
else:
index[word][site] = 1 # Case 2: Word exists, site doesn't → Add site with count 1
Step 2: Implementing search_single_word
This function retrieves a sorted list of (site, frequency) tuples for a given word.
def search_single_word(index, word):
if word in index:
result = list(index[word].items()) # Convert site-frequency dictionary to a list
result.sort(key=lambda x: x[1], reverse=True) # Sort by frequency (highest first)
return result
return []
Example Usage:
index = {
"mit": {"web.mit.edu": 3, "csail.mit.edu": 2},
"google": {"web.mit.edu": 1, "google.com": 3}
}
print(search_single_word(index, "mit"))
# Output: [('web.mit.edu', 3), ('csail.mit.edu', 2)]
Step 3: Implementing search_multiple_words
This function searches for multiple words and aggregates their frequencies across sites.
def search_multiple_words(index, words):
site_freq = {}
for word in words:
for site, freq in search_single_word(index, word):
if site in site_freq:
site_freq[site] += freq # Combine frequencies across words
else:
site_freq[site] = freq
result = list(site_freq.items())
result.sort(key=lambda x: x[1], reverse=True) # Sort by frequency
return result
Example Usage:
print(search_multiple_words(index, ["mit", "google"]))
# Output: [('web.mit.edu', 4), ('google.com', 3), ('csail.mit.edu', 2)]
To mark this module as complete, you must finish this quiz. Once submitted, you'll need to wait 2 hours before attempting it again.