Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to search through a list of objects? #7

Closed
christopher-caldwell opened this issue Oct 10, 2022 · 15 comments
Closed

Is it possible to search through a list of objects? #7

christopher-caldwell opened this issue Oct 10, 2022 · 15 comments
Labels
question Further information is requested

Comments

@christopher-caldwell
Copy link

One of the big reasons why I use Fuse is that I can pass a list of objects and it will search on a subset of keys. This requires additional indexing, I assume. Is it possible to do this with uFuzzy? It seems to only accept string[]

@leeoniya
Copy link
Owner

leeoniya commented Oct 10, 2022

out of the box, no. but with bit of cleverness, yes.

the idea is to create a string[] by concat'ing the fields you're interested in, using some rarely-used separator char like ¦. this effectively becomes your "index". once you have this list you can use uFuzzy as normal. it's not a perfect solution, but i think it will work quite well.

it would take a bit more code to get the highlight ranges (info.ranges) properly partitioned based on offsets from prior ¦, but may not be necessary if you don't intend to do partial match highlighting in the UI.

this use case is on my demos TODO list, PRs welcome ;)

@leeoniya leeoniya added the question Further information is requested label Oct 10, 2022
@leeoniya
Copy link
Owner

leeoniya commented Oct 10, 2022

i should add that this strategy will work in an AND fashion for multi-term needles (same as any multi-term uFuzzy search)

if you want to do OR or more advanced things, you'll probably want to make a different haystack array for each field and run separate uFuzzy matches for each haystack (maybe with different opts). then combine and "re-rank" matches that overlap, etc.

depending on what you need, at some point it may make sense to use a proper document search lib with an inverted index, etc., but given its poor performance, Fuse wouldn't be my first choice.

@christopher-caldwell
Copy link
Author

christopher-caldwell commented Oct 12, 2022

I am actively looking for an alternative to Fuse, so I am certainly interested. To add a bit more context, It's not a complex search where something like Elastic would be applicable. My use case (btw, happy to be told I am doing it wrong) is something like, you fetch a set of records, maybe virtualized on the UI. Client side search to quickly filter the list down to what they need.

type RadioStation {
  callLetters: string
  title: string
  hostName: string
}

Say I want to search "Jim" on an array of the above structure. It should return because hostName has a "Jimmy Neutron".

Hopefully this isn't annoying, I'd love to use something besides Fuse. Based on your second comment, my intuition is to create fields arrays. Using the above, 3 arrays, one for each the keys. Then you search on each of the arrays? Then have to combine the results back together.

@leeoniya
Copy link
Owner

something like this (incomplete):

const uFuzzy = require('../dist/uFuzzy.cjs');

let recs = [
    {
        foo: "Martha came to the conclusion that shake weights are a great gift for any occasion.",
        bar: "his seven-layer cake only had six layers.",
        baz: "The urgent care center was flooded with patients after the news of a new deadly virus was made public.",
    },
    {
        foo: "I never knew what hardship looked like until it started raining bowling balls.",
        bar: "It didn't make sense unless you had the power to eat colors.",
        baz: "For oil spots on the floor, nothing beats parking a motorbike in the lounge.",
    },
    {
        foo: "They were excited to see their first sloth.",
        bar: "The rusty nail stood erect, angled at a 45-degree angle, just waiting for the perfect barefoot to come along.",
        baz: "She wanted a pet platypus but ended up getting a duck and a ferret instead.",
    },
    {
        foo: "They say that dogs are man's best friend, but this cat was setting out to sabotage that theory.",
        bar: "He poured rocks in the dungeon of his mind.",
        baz: "He was willing to find the depths of the rabbit hole in order to be with her.",
    },
    {
        foo: "When I was little I had a car door slammed shut on my hand and I still remember it quite vividly.",
        bar: "She advised him to come back at once.",
        baz: "A purple pig and a green donkey flew a kite in the middle of the night and ended up sunburnt.",
    },
    {
        foo: "She looked at the masterpiece hanging in the museum but all she could think is that her five-year-old could do better.",
        bar: "We should play with legos at camp.",
        baz: "Trash covered the landscape like sprinkles do a birthday cake.",
    },
    {
        foo: "It was obvious she was hot, sweaty, and tired.",
        bar: "As he waited for the shower to warm, he noticed that he could hear water change temperature.",
        baz: "It took him a while to realize that everything he decided not to change, he was actually choosing.",
    },
    {
        foo: "She moved forward only because she trusted that the ending she now was going through must be followed by a new beginning.",
        bar: "Fluffy pink unicorns are a popular status symbol among macho men.",
        baz: "I am counting my calories, yet I really want dessert.",
    },
    {
        foo: "Happiness can be found in the depths of chocolate pudding.",
        bar: "Strawberries must be the one food that doesn't go well with this brand of paint.",
        baz: "Joe made the sugar cookies; Susan decorated them.",
    },
    {
        foo: "Poison ivy grew through the fence they said was impenetrable.",
        bar: "Nothing is as cautiously cuddly as a pet porcupine.",
        baz: "Dan ate the clouds like cotton candy.",
    },
];

let haystack = recs.map(r => `${r.foo}¦${r.bar}¦${r.baz}`);

let needle = 'was';

let uf = new uFuzzy();

let idxs = uf.filter(haystack, needle);

console.log(idxs.map(i => recs[i]));

@christopher-caldwell
Copy link
Author

Thanks for answering!

@swyxio
Copy link

swyxio commented Feb 4, 2023

@leeoniya i think the other part of doing objects is that you naturally want some fields to rank higher than other fields, so there's enough here that it should be a feature IMO

@leeoniya
Copy link
Owner

leeoniya commented Feb 4, 2023

@sw-yx you should be able to use info.ranges to work out which portions of each match land within which segments, then use this to help with sorting.

of course each match can be cross-field when they're all concatenated. so you may want to run matches on each field as separate haystacks, then combine the results in the order of your most to least important haystack.

feel free to submit a demo if you get something interesting working. kinda related to #19.

uFuzzy does not work with objects so any kind of object search would have to be a wrapper and not a core API.

@swyxio
Copy link

swyxio commented Feb 7, 2023

thanks for the pointer - i have this in a todo issue right now.. not sure if i'll get to this but hopefully when i have some downtime

@stargazer33
Copy link

stargazer33 commented Apr 7, 2023

@leeoniya it would be great to add something like this #7 (comment) to documentation/readme!

My case is very similar to @christopher-caldwell case : want to migrate search from using lunrjs to something better... saw this project... and than... spend hours searching here and there for the answer to "how can I use uFuzzy with list of objects?"

I think many developers using lists of objects...

@jd1378
Copy link

jd1378 commented May 6, 2023

@leeoniya is it possible at least have some way to use objects as is and use an option to access the object's key we want to search in ?
like we have this:

const a = [
  {
     name: 'foo'
  },
  {
    name: 'bar'
  }
]

now having thousands of these, is it still efficient to map them ? wouldn't just passing the object to uFuzzy and then specifying name as key for search be more efficient ?

@leeoniya
Copy link
Owner

leeoniya commented May 6, 2023

the overhead should be very small. the problem is that the use-objects approach becomes not useful when you have more than one property to search, or deep objects. the flat string array is universal.

@jd1378
Copy link

jd1378 commented May 6, 2023

I see
you are right

@punkpeye
Copy link

punkpeye commented Feb 2, 2025

Has anyone build abstraction that would allow search of object collection?

@punkpeye
Copy link

punkpeye commented Feb 3, 2025

This is what I ended up to implement https://glama.ai search.

/* eslint-disable unicorn/no-array-method-this-argument */

import { UnexpectedStateError } from '#app/errors';
import uFuzzy from '@leeoniya/ufuzzy';

const getRanges = (subject: string, needle: string) => {
  const uf = new uFuzzy();

  const haystack = [subject];

  // our haystack is an array of 1, therefore we can assume the first match is the only match
  const ids = uf.filter(haystack, needle) as [0] | [] | null;

  if (!ids || ids.length === 0) {
    return [];
  }

  const info = uf.info(ids, haystack, needle);

  return info.ranges[0];
};

export const highlightRanges = uFuzzy.highlight.bind(uFuzzy);

export type FuzzySearchResult<T> = {
  item: T;
  ranges: Record<keyof T, number[] | undefined>;
};

export const fuzzySearch = <T extends Record<string, unknown>>({
  fields,
  items,
  searchTerm,
}: {
  fields: Array<keyof T>;
  items: T[];
  searchTerm: string;
}): Array<FuzzySearchResult<T>> => {
  if (searchTerm === null) {
    return [];
  }

  const values = items.map((item) => {
    return fields
      .map((field) => {
        if (typeof item[field] === 'string') {
          return item[field];
        }

        return '';
      })
      .join('|');
  });

  const uf = new uFuzzy();

  const [indexes, info] = uf.search(values, searchTerm);

  if (!indexes || !info) {
    return [];
  }

  return indexes.map((index) => {
    const item = items[index];

    if (!item) {
      throw new UnexpectedStateError('Could not find item in haystack');
    }

    const ranges = {} as Record<keyof T, number[]>;

    for (const field of fields) {
      ranges[field] = getRanges(String(item[field]), searchTerm);
    }

    return {
      item,
      ranges,
    };
  });
};

It is not perfect (e.g. ranges will break if match is across multiple fields), but does the job.

Hope others find it useful too

@punkpeye
Copy link

punkpeye commented Feb 3, 2025

The hackiest part is getRanges.

That probably could be avoided by calculating range offset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants