Bloom Filters and its applications
Project Instalation
- clone the project
- There are two application one is for password and another one for intersection
- for password app
- Go to src/app/bloom_filter/password
- To run: python newpass.py
- please enter values with double quote
- for Intersection app
- Go to src/app/bloom_filter/intersection
- To run: python intersection.py
- please enter values with double quote
Problem Statement
What is the fastest and space efficient way to search for a word that is present or not?
- HashTable?
- Tries? Above two data structure optimised in time complexity but comprimise in space. To overcome all these problems and to do it effective time and space we use Bloom filters
Introduction
A Bloom filter is a space-efficient probabilistic data structure, that is used to test whether an element is a member of a set. Bloom filter options ➢ Insert ➢ Search
Insertion
- An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions, generating a uniform random distribution.
- To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.
Search
- To search an element, feed it to each of the k hash functions to get k array positions
- If any of the bits at these positions is 0, the element is definitely not in the set – if it were, then all the bits would have been set to 1 when it was inserted
- If all are 1, then either the element is in the set, or the bits have by
chance been set to 1 during the insertion of other elements, resulting
in a false positive.
Delete in Bloom filters
Deleting elements from filter is not possible because, if we delete a single element by clearing bits at indices generated by k hash functions, it might cause deletion of few other elements. Bloom filters do not permit deletion of items
Application of Bloom filters
1.Password :
Whenever you are trying to set password for something , it should be strong enough and be harder to guess.Here we are doing a small approach using bloom filters which avoid users keeping easy passwords.
2.Intersection :
When you are given two huge sets of data and is asked to find the common data between them. We can easily find them using Bloom filters.