High Volume Querying

If you are querying one of the datasets in the one-mnhs API and need to retrieve more than 10,000 records, you will need to use one of the high volume querying endpoints instead of a standard query endpoint. This will require special access, so if you do not have it you’ll need to request it.

Assuming you do have that access, you can retrieve records from the high volume query endpoint per the example below. The example uses the collections dataset as an example, but all the high volume querying endpoints work the same way.

Note

Not all the different datasets in the one-mnhs API support high volume queries. The smaller datasets, for example, don’t have enough records to ever need high volume querying. Check the openapi docs to verify that your target dataset supports high volume querying.

The key difference between the high volume query endpoints and standard query endpoints is that you will page through the results using the "nextPageToken" rather than specifying a page you wish to retrieve.

import json

import requests

next_page_token: str | None = None
results = list[str]()

while True:
    resp = requests.post(
        "https://mnhs-rma-prod.uc.r.appspot.com/collections/high-volume-query"
        data=json.dumps(
            {
                # You MUST supply at least one sort for high volume querying,
                # if you do not the API will respond with an error.
                "sorts": [{"field": "alternateId"}],
                # Filters can be whatever you want, this is just one
                # possibility:
                 "filters": [
                     {"field": "itemType", "op": "=", "term": "Photographs"},
                     {"field": "hasMedia", "op": "=", "term": "true"},
                 ],
                "nextPageToken": next_page_token,
            }
        ),
        # Set your pageSize based on the speed of your connection, 1000 is
        # just an example.
        params={"pageSize": 1000},
        headers={"Authorization": f"Bearer {access_token}"},
    )
    if resp.status_code == 200:
        result = json.loads(resp.content)
        if result["totalHitCount"] == 0:
            print("No results found.")
            break
        else:
            # Retrieve the nextPageToken for use in your next request:
            next_page_token = result["nextPageToken"]
            for hit in result["hits"]:
                # Do whatever
                results.append(hit["id"])
            # hasNext will be returned as "False" once you've hit the end of
            # the pages:
            if not result["hasNext"]:
                break
    else:
        break

Note

When using the high volume query endpoints, you must retrieve all the pages in order, you cannot jump around to specific pages the way you can with the standard query endpoints.