Search
Patexia Research
Patent No. US 11275739
Issue Date Mar 15, 2022
Claim this patent
PDF Unavailable

Patent 11275739 - Prefix indexing > Claims

  • 1. A system comprising: at least one hardware processor; andat least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:accessing a source table organized into a set of batch units;decomposing a data item from the source table into multiple segments, the multiple segments comprising a root segment and a child segment;generating a set of fingerprints for the data item based on the multiple segments, the set of fingerprints comprising a first fingerprint generated based on the root segment and a second fingerprint generated based on the child segment and the first fingerprint; andgenerating an index for the source table based on the set of fingerprints, the index comprising a set of filters, the generating of the index comprising: populating a filter in the set of filters with a first number of bits based on the first fingerprint, andpopulating the filter in the set of filters with a second number of bits based on the second fingerprint.
    • 2. The system of claim 1, wherein the generating of the set of fingerprints comprises: generating the first fingerprint based on the root segment; andgenerating the second fingerprint based on the child segment and the first fingerprint.
      • 3. The system of claim 2, wherein: generating the first fingerprint comprises computing a first hash over the root segment; andgenerating the second fingerprint comprises computing a second hash over the child segment using the first hash as a seed for a hashing function used to compute the second hash.
        • 4. The system of claim 3, wherein: the child segment is a first child segment;the multiple segments further comprise a second child segment; andthe generating of the set of fingerprints for the data item further comprises generating a third fingerprint based on the second child segment and the second fingerprint.
          • 5. The system of claim 4, wherein the generating of the third fingerprint comprises computing a third hash over the second child segment using the second hash as a seed for a hashing function used to compute the third hash.
        • 6. The system of claim 3, wherein: the operations further comprise determining, based on the first hash, the filter from the set of filters to populate using the set of fingerprints.
    • 7. The system of claim 1, wherein the first number of bits is greater than the second number of bits.
    • 8. The system of claim 1, wherein the root segment and the child segment of the data item have a hierarchical relationship.
    • 9. The system of claim 1, wherein the operations further comprise: receiving a query directed at the source table, the query specifying a search pattern;pruning the set of batch units to scan for data matching the search pattern using the index, the pruning of the set of batch units comprising identifying a subset of batch units to scan for matching data; andprocessing the query by scanning the subset of batch units.
  • 10. A method comprising: accessing a source table organized into a set of batch units;decomposing a data item from the source table into multiple segments, the multiple segments comprising a root segment and a child segment;generating a set of fingerprints for the data item based on the multiple segments, the set of fingerprints comprising a first fingerprint generated based on the root segment and a second fingerprint generated based on the child segment and the first fingerprint; andgenerating an index for the source table based on the set of fingerprints, the index comprising a set of filters, the generating of the index comprises: populating a filter in the set of filters with a first number of bits based on the first fingerprint, andpopulating the filter in the set of filters with a second number of bits based on the second fingerprint.
    • 11. The method of claim 10, wherein the generating of the set of fingerprints comprises: generating the first fingerprint based on the root segment; andgenerating the second fingerprint based on the child segment and the first fingerprint.
      • 12. The method of claim 11, wherein: generating the first fingerprint comprises computing a first hash over the root segment; andgenerating the second fingerprint comprises computing a second hash over the child segment using the first hash as a seed for a hashing function used to compute the second hash.
        • 13. The method of claim 12, wherein: the child segment is a first child segment;the multiple segments further comprise a second child segment; andthe generating of the set of fingerprints for the data item further comprises generating a third fingerprint based on the second child segment and the second fingerprint.
          • 14. The method of claim 13, wherein the generating of the third fingerprint comprises computing a third hash over the second child segment using the second hash as a seed for a hashing function used to compute the third hash.
        • 15. The method of claim 12, wherein: the method further comprises determining, based on the first hash, the filter from the set of filters to populate using the set of fingerprints.
    • 16. The method of claim 10, wherein the first number of bits is greater than the second number of bits.
    • 17. The method of claim 10, further comprising: receiving a query directed at the source table, the query specifying a search pattern;pruning the set of batch units to scan for data matching the search pattern using the index, the pruning of the set of batch units comprising identifying a subset of batch units to scan for matching data; andprocessing the query by scanning the subset of batch units.
  • 18. A computer-storage medium comprising instructions that, when executed by one or more processors of a machine, configure the machine to perform operations comprising: accessing a source table organized into a set of batch units;decomposing a data item from the source table into multiple segments, the multiple segments comprising a root segment and a child segment;generating a set of fingerprints for the data item based on the multiple segments, the set of fingerprints comprising a first fingerprint generated based on the root segment and a second fingerprint generated based on the child segment and the first fingerprint; andgenerating an index for the source table based on the set of fingerprints, the index comprising a set of filters, the generating of the index comprises: populating a filter in the set of filters with a first number of bits based on the first fingerprint, andpopulating the filter in the set of filters with a second number of bits based on the second fingerprint.
    • 19. The computer-storage medium of claim 18, wherein the generating of the set of fingerprints comprises: generating the first fingerprint based on the root segment; andgenerating the second fingerprint based on the child segment and the first fingerprint.
      • 20. The computer-storage medium of claim 19, wherein: generating the first fingerprint comprises computing a first hash over the root segment; andgenerating the second fingerprint comprises computing a second hash over the child segment using the first hash as a seed for a hashing function used to compute the second hash.
        • 21. The computer-storage medium of claim 20, wherein: the child segment is a first child segment;the multiple segments further comprise a second child segment; andthe generating of the set of fingerprints for the data item further comprises generating a third fingerprint based on the second child segment and the second fingerprint.
          • 22. The computer-storage medium of claim 21, wherein the generating of the third fingerprint comprises computing a third hash over the second child segment using the second hash as a seed for a hashing function used to compute the third hash.
        • 23. The computer-storage medium of claim 20, wherein: the operations further comprise determining, based on the first hash, a filter from the set of filters to populate using the set of fingerprints.
    • 24. The computer-storage medium of claim 18, wherein the first number of bits is greater than the second number of bits.
    • 25. The computer-storage medium of claim 18, wherein the root segment and the child segment of the data item have a hierarchical relationship.
Menu