SolvedNuGetGallery Improve search relevance on NuGet.org

  1. Search for Microsoft Bot Connector vs. Microsoft Bot Connector - obvious that search is broken here.
    Reference: https://twitter.com/adpedley/status/874460132725235712
  2. #2789: Support partial search terms in search: E.g. "Microsoft.Bot.Con". Search has no results:
    image
  3. #782: Provide gestures for search filters
  4. Enable auto complete in the search box.
47 Answers

✔️Accepted Answer

@skofman1 I think they should be based on popularity much more. For example, when I search for "csharp", hoping to find Microsoft.CodeAnalysis.CSharp, some of the results are:

Position Package Downloads
2 iTextSharp 1 800 000
3 CefSharp.WinForms 180 000
6 Tarantool.CSharp 2 500
7 Hallmanac.Funqy.CSharp 30
8 Microsoft.CodeAnalysis.Scripting.CSharp 36 000
9 aliyun.oss.csharp.sdk.netstandard1.6 0
10 web-csharp-001-core 330
13 Microsoft.CodeAnalysis.CSharp 6 000 000

I don't see any reason why a package that nobody ever downloaded should be higher than a package that was downloaded six million times for this search term.

Other Answers:

Even for popularity, it's not quite accurate either....

Search for Rx has Rx.NET on page 6 of the results…which no one will ever see.
https://preview.nuget.org/packages?q=rx&page=6

System.Reactive, which is Rx.NET, has 347k downloads for the main package alone (for 3.x, there are 7-8 packages).

There is no package on the result set in the previous 5 pages that has more downloads.

Being based on popularity still does not explain why do we have tens of unrelated packages before the one that actually has the matching id?

screen shot 2017-10-20 at 12 13 06 am

We chose not to put exact match always first because there are common searches where this falls over.

  • Search "entity". You very likely want EntityFramework or similar. Not the Entity package with 3000 downloads.
  • Similarly "testing".
  • "json". People want almost all of the time want Newtonsoft.Json not the "JSON" package.
  • ... Many other examples of genericly named unpopular packages.

Our search click data and user feedback shows that people do not usually want the generic name package. They want the popular one.

For cases where the name is more specific ("nlog") and the exact match is very popular, of course it should be first.

For cases where the package is not hugely popular but still an exact match and the ID is long and specific ("serilog.sinks.elmahio") I agree it should probably be first. We will likely improve this with another fix that we are planning.

Another approach we considered was to follow what other package managers do for exact match and put that result in a special colored box of with a special label. This is less a relevancy thing and more a special search result that comes back from some searches in parallel with "relevant" results. That idea is tracked here:
#7463. Please leave feedback on that issue if you have thoughts 😀.

As with all search relevancy changes, it's a balancing act that we tried to get right on this iteration but certainly we missed some cases. From our user feedback and current top queries, we know we improved overall but cases like this still are not perfect yet. Keep the examples flowing and we will weigh them against the rest of the search and non-search work items in our backlog.

I believe the search has been vastly improved! Great work!