MongoDB - Aggregation Framework - series #02

Intro Note

You know, designing systems is about shaping data. You want the tools that facilitate the ability to shape data exactly how you want it.

Let's try another example in this series of articles:

Suppose you have a collection of musical artists

  • who have an arr of album titles
  • which have an arr of song titles

How do you query that nested level arr

  • and filter for a particular match. For example, all artists that have song titles that have the word "beast" in it.

Also, we don't want just the list of songs,

  • we want to retrieve the complete set of arists who have a song title that match that criteria, plus its arr of matched song titles that contain the searched-for term.
/*
@datamodel
[
 {
  artist: "",
  bio: "",
  genre: [],
  albums: [
   {
    albumTitle: ""
    songTitles: [
     {
      song: ""
     }
    ]
    }
   ]
  }
 ]

*/

In this test example, let's keep our test resultset small, by retrieving only the first 5 results, and limiting the returning fields (projected fields) to only the artist and searchResult fields:


const searchTerm = "beast";
const regex = new RegExp(`${searchTerm}`, "i");

//albumSongs by regex
lpromiseStringify(
  mgArr(dbEnum.nlpdb, collEnum.songsColl,
    copyField("searchResult", "albums"),
    unwindArr("searchResult"),
    unwindArr("searchResult.albumSongs"),
    matchRegex("searchResult.albumSongs.song", regex),
    limit(5),
    projectIncludeNoId("artist", "searchResult"),
  )
);

/*

@output

[
  {
    "artist": "Peter, Paul & Mary",
    "searchResult": {
      "albumTitle": "album: A Holiday Celebration (1988)",
      "albumSongs": {
        "song": "The Friendly Beasts"
      }
    }
  },
  {
    "artist": "Peabo Bryson",
    "searchResult": {
      "albumTitle": "album: Through The Fire (1994)",
      "albumSongs": {
        "song": "Beauty And The Beast"
      }
    }
  },
  {
    "artist": "Fugees",
    "searchResult": {
      "albumTitle": "album: The Score (1996)",
      "albumSongs": {
        "song": "The Beast"
      }
    }
  },
  {
    "artist": "Classified",
    "searchResult": {
      "albumTitle": "album: Tomorrow Could Be The Day Things Change (2018)",
      "albumSongs": {
        "song": "Beastie Boy"
      }
    }
  },
  {
    "artist": "John Anderson",
    "searchResult": {
      "albumTitle": "album: Goldmine (2015)",
      "albumSongs": {
        "song": "Louisiana Son Of A Beast"
      }
    }
  }
]

*/

Notes

  • Some of this functionality is explained in the first article in this series, so feel free to read that article first.

  • the search term will be coming in dynamically, so we have to convert the string into a Regex at runtime.

  • lpromiseStringify is like lpromise (log the promise), but stringifying it first allows us to fully log the nested arrs and objs in the log output.

  • copyField is a wrapper around the $addField command. in this case it creates a copy of the field "albums", and calls the newly copied field, "searchResult" $addField is used for other use case, I'll talk about that in the future. We need to make a copy the albums first, because by our spec, we want to return the full album set to the caller. The new copy of it will be mutated by being filtered down to only the results that contain the search term.

  • Now that we have a copy of the albums field (an arr of albums) , we'll apply a powerful command called $unwind. My wrapper func is called unwindArr. The trailing "Arr" in the func name is a reminder that this command only works on fields of type arr. It's like a flatten. It creates a new root document for each elem in the arr. So if an artist has 5 albums, we'll now have 5 artist objects that only differ by the albums field, which is not an arr anymore, but is the value of the elem that was in the arr. It's a way around doing nested for-loops. It's a very useful, popular, and powerful command. We'll talk more about it later.

  • Notice with dot notation, we can keep drilling down into the nested arrs until we get to the data we want: unwindArr("searchResult.albumSongs"),

  • Because the nested arrs are unwound, (or flattened), they are just an obj in an obj in an obj, etc. So we can access the value with dot notation (thus no explicit loops) matchRegex("searchResult.albumSongs.song", regex),

  • If you use the "projectInclude" func, all fields are automatically exclude except for those explicitly listed. The trailing "NoId" part of this variant of the func forces excluding the PK (primary key), which the framework forces including by default. In your production queries you'll typically always include the PK, but I use this "NoID" variant for testing when the PK is not part of what I'm testing. It removes that noise.

What's next

If anything is unclear, or you have any questions, let me know.

We'll be getting more into the utility and wrapper funcs that make up this data-shaping system in the future. Most of them are just simple oneliners.

Resources

You'll notice I use some abbreviations like arr, obj, elem, func. Here is the whole list of them: dev.to/functional_js/popular-abbreviations-..

$addField command docs.mongodb.com/manual/reference/operator/..

$unwind command docs.mongodb.com/manual/reference/operator/..