Making the Bot - Part I

Now that our app has an authenticated session we can start hitting the API. Let’s go over the general flow of hitting an API endpoint.

First, we’ll need to include our access token with any request we make that requires OAuth. This is accomplished by including “Authorization: ‘bearer TOKEN’ ” in our header information, where TOKEN is the access token you acquired when you authenticated. All of Reddit’s OAuth endpoints require us to hit oauth.reddit.com, and it’s advisable to use https over http for security reasons.

Trying it out

Let's look at the details of the user account we're logged in as. To do this we'll hit the api/v1/me endpoint. This is a very simple endpoint that requires nothing more than sending a GET request to the URL while logged in. Remember, the base URL for all Reddit API requests is "oauth.reddit.com" and we want JSON data back which means the full URL of the endpoint is "https://oauth.reddit.com/api/v1/me.json". So our request for this endpoint looks like:


options = {
        url: "https://oauth.reddit.com/api/v1/me.json",
        method: 'GET',
        headers: {
            'User-Agent': 'YOUR_USER_AGENT',
            'Authorization': ‘bearer TOKEN’
}

request(options, function(err, res, body) {
    var acctData = JSON.parse(body);
    /* Do something with the account data */
});
Easy! Now let's do something more interesting.

The Bot

While Reddit has several features for finding threads relevant to a user's interests, it can be difficult to find threads that aren't brand new or trending while they're still active. There's also no way to monitor a subreddit for threads that match your interests. We're going to solve that problem in this guide by building a bot that can watch a user-specified subreddit for new threads and alert the user if a thread has been started that contains a keyword they specified. For example, maybe we want to watch r/webdev for new threads that have anything to do with Javascript.

Since we want to look for new threads, we’re going to use the [r/subreddit]/new endpoint. In this case “subreddit” is the name of the subreddit we want to pull information from. This endpoint also has several options:

after/before - By specifying the fullname of a thread we can slice the data we get back. This would, for example, allow us to ignore any threads we’ve already pulled in a previous request.
limit - As the name implies this lets us put a limit on how many threads we get back. The default is 25 and the maximum value is 100.
count - This is the amount of threads we’ve already seen. Basically another way to avoid seeing threads we’ve already processed.
show - The only value for this is “all” which lets us get back hidden threads. For our purposes this isn’t really needed.

For better or for worse, we get an enormous amount of data back. For our purposes we don’t care about most of it, although you can refer to the Reddit object documentation here to see what's returned. Assuming we’ve named our JSON response data “res” then the information for each thread object is located under res[‘data’][‘children’]. The first thread is at index [0] and continues to N - 1 where N is the number of threads requested. The details for each thread are located under the [‘data’] key. The data we’re interested in are the thread’s title, name (which is the fullname we’ll need for the before/after options on our subsequent request), and link. So to get the data we need:


var res = JSON.parse(body);
var fullname = res[‘data’][‘children’][0][‘data’][‘name’];
var title = res[‘data’][‘children’][0][‘data’][‘title’];
var link = res['data']['children'][0]['data']['title'];

Of course, we want to examine more than just one thread, we want to look at all the new threads. So to do that we’ll loop over the JSON data and put it into a dictionary:


var rawThreadData = json['data']['children'];
var threads = [];

for (thread in rawThreadData) {
    var curThread = {};
    curThread['title'] = rawThreadData[thread]['data']['title'];
    curThread['fullname'] = rawThreadData[thread]['data']['name'];
    curThread['link'] = rawThreadData[thread]['data']['permalink'];
    threads.push(curThread);
}

Now we’re getting results back, but if we repeat the query we’re going to get a lot of stuff we’ve already seen back. The solution is to attach the “before” parameter to our GET request. We want to use the “before” parameter instead of “after” because we only want threads that we haven’t seen yet; if we used “after” we’d keep going backward through all threads that show up under “new”. With the Reddit API parameters are sent by attaching them to the URL. For example:


https://reddit.com/r/politics/new.json?limit=100&after=t3_6byrml

There’s a few things to notice here. First is that we attach our parameters after the “.json” specifier. Second is that parameters are sent as queries. The first parameter is preceeded by a “?” and then subsequent parameters are preceeded by an “&”. In this example we're fetching the most recent 100 threads starting from the thread with the fullname "t3_6byrml" in the "politics" subreddit.

At this point we can let our code repeat infinitely so that we’re always monitoring for new threads. However, we have to be careful. If we just let the program execute at its own pace it will violate the API’s rules about how often we’re allowed to query the API. Therefore, we have to tell our bot to wait at least a second between queries. To accomplish this we’ll use the timeout function which will take our calling function as a callback.


setTimeout(function() {
    getThreads(token, before, limit);
}, 60000);

The above code assumes that we've wrapped all the code so far into its own function "getThreads" which takes the authentication token as an argument along with the thread we should use as our starting point to look for new threads, and the amount of threads we want to get back. We've set the timeout to one minute because on most subreddits it's unlikely that new threads will have been posted faster than that.

We still have to actually do something with these threads we’ve collected though. Namely, we need to parse them to look for a search term and then only keep those threads that match. We’ll make another function for this:


function processThreads(token, threads) {
    var searchTerm = "YOUR_SEARCH_TERM";
    var re = new RegExp(searchTerm, 'i');
    var hits = [];

    for (thread in threads) {
        if (re.test(threads[thread]['title'])) {
            var hit = {};
            hit['title'] = threads[thread]['title']; 
            hit['link'] = threads[thread]['link'];
            hits.push(hit);
        }
    }

    if (hits.length) {
        console.log("Got " + hits.length + " hits");
    } else {
        console.log("No hits");
    }
}

Going over this code very quickly, we use a regular expression to look for our term. If we get a hit, we add the title and link of the thread to a dictionary which is then added to an array.

At this point we’ve now successfully used the Reddit API to find newly posted threads that match a search term the user is interested in. But what good is collecting this information if we’re not going to do anything with it? On the next page I'll show you one example of something we can do.