The Moz Hyperlinks API: An Introduction

0
152



What precisely IS an API? They’re these issues that you simply copy and paste lengthy unusual codes into Screaming Frog for hyperlinks knowledge on a Web site Crawl, proper?

I’m right here to let you know there’s a lot extra to them than that – if you happen to’re keen to take just some little steps. However first, some fundamentals.

What’s an API?

API stands for “software programming interface”, and it’s simply the best way of… utilizing a factor. Every thing has an API. The net is a huge API that takes URLs as enter and returns pages.

However particular knowledge companies just like the Moz Hyperlinks API have their very own algorithm. These guidelines fluctuate from service to service and generally is a main stumbling block for individuals taking the subsequent step.

When Screaming Frog offers you the additional hyperlinks columns in a crawl, it’s utilizing the Moz Hyperlinks API, however you may have this functionality wherever. For instance, all that tedious handbook stuff you do in spreadsheet environments may be automated from data-pull to formatting and emailing a report.

When you take this subsequent step, you may be extra environment friendly than your rivals, designing and delivering your individual web optimization companies as a substitute of relying upon, paying for, and being restricted by the subsequent proprietary product integration.

GET vs. POST

Most APIs you’ll encounter use the identical knowledge transport mechanism as the net. Which means there’s a URL concerned similar to a web site. Don’t get scared! It’s simpler than you assume. In some ways, utilizing an API is rather like utilizing a web site.

As with loading net pages, the request could also be in considered one of two locations: the URL itself, or within the physique of the request. The URL known as the “endpoint” and the customarily invisibly submitted additional a part of the request known as the “payload” or “knowledge”. When the information is within the URL, it’s known as a “question string” and signifies the “GET” technique is used. You see this on a regular basis while you search:

https://www.google.com/search?q=moz+hyperlinks+api <-- GET technique 

When the information of the request is hidden, it’s known as a “POST” request. You see this while you submit a type on the net and the submitted knowledge doesn’t present on the URL. Whenever you hit the again button after such a POST, browsers normally warn you towards double-submits. The rationale the POST technique is commonly used is you could match much more within the request utilizing the POST technique than the GET technique. URLs would get very lengthy in any other case. The Moz Hyperlinks API makes use of the POST technique.

Making requests

An internet browser is what historically makes requests of internet sites for net pages. The browser is a kind of software program generally known as a consumer. Purchasers are what make requests of companies. Extra than simply browsers could make requests. The power to make consumer net requests is commonly constructed into programming languages like Python, or may be damaged out as a standalone software. The most well-liked instruments for making requests exterior a browser are curl and wget.

We’re discussing Python right here. Python has a built-in library known as URLLIB, but it surely’s designed to deal with so many various kinds of requests that it’s a little bit of a ache to make use of. There are different libraries which can be extra specialised for making requests of APIs. The most well-liked for Python known as requests. It’s so widespread that it’s used for nearly each Python API tutorial you’ll discover on the net. So I’ll use it too. That is what “hitting” the Moz Hyperlinks API seems like:

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

Provided that all the pieces was arrange accurately (extra on that quickly), this may produce the next output:

{'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==',
 'outcomes': [{'anchor_text': 'moz',
              'external_pages': 7162,
              'external_root_domains': 2026}]}

That is JSON knowledge. It is contained inside the response object that was returned from the API. It’s not on the drive or in a file. It’s in reminiscence. As long as it’s in reminiscence, you are able to do stuff with it (usually simply saving it to a file).

When you wished to seize a chunk of knowledge inside such a response, you possibly can check with it like this:

response['results'][0]['external_pages']

This says: “Give me the primary merchandise within the outcomes listing, after which give me the external_pages worth from that merchandise.” The consequence could be 7162.

NOTE: When you’re truly following alongside executing code, the above line gained’t work alone. There’s a specific amount of setup we’ll do shortly, together with putting in the requests library and establishing just a few variables. However that is the essential thought.

JSON

JSON stands for JavaScript Object Notation. It’s a means of representing knowledge in a means that’s straightforward for people to learn and write. It’s additionally straightforward for computer systems to learn and write. It’s a quite common knowledge format for APIs that has considerably taken over the world for the reason that older methods had been too tough for most individuals to make use of. Some individuals would possibly name this a part of the “restful” API motion, however the far more tough XML format can also be thought of “restful” and everybody appears to have their very own interpretation. Consequently, I discover it greatest to simply deal with JSON and the way it will get out and in of Python.

Python dictionaries

I lied to you. I mentioned that the information construction you had been taking a look at above was JSON. Technically it’s actually a Python dictionary or dict datatype object. It’s a particular type of object in Python that’s designed to carry key/worth pairs. The keys are strings and the values may be any kind of object. The keys are just like the column names in a spreadsheet. The values are just like the cells within the spreadsheet. On this means, you may consider a Python dict as a JSON object. For instance right here’s making a dict in Python:

my_dict = {
    "title": "Mike",
    "age": 52,
    "metropolis": "New York"
}

And right here is the equal in JavaScript:

var my_json = {
    "title": "Mike",
    "age": 52,
    "metropolis": "New York"
}

Just about the identical factor, proper? Look carefully. Key-names and string values get double-quotes. Numbers don’t. These guidelines apply constantly between JSON and Python dicts. In order you may think, it’s straightforward for JSON knowledge to movement out and in of Python. This can be a nice present that has made fashionable API-work extremely accessible to the newbie by way of a software that has revolutionized the sphere of knowledge science and is making inroads into advertising and marketing, Jupyter Notebooks.

Flattening knowledge

However beware! As knowledge flows between techniques, it’s not unusual for the information to subtly change. For instance, the JSON knowledge above is perhaps transformed to a string. Strings would possibly look precisely like JSON, however they’re not. They’re only a bunch of characters. Typically you’ll hear it known as “serializing”, or “flattening”. It’s a refined level, however price understanding as it’s going to assist with one of many largest hindrances with the Moz Hyperlinks (and most JSON) APIs.

Objects have APIs

Precise JSON or dict objects have their very own little APIs for accessing the information inside them. The power to make use of these JSON and dict APIs goes away when the information is flattened right into a string, however it’s going to journey between techniques extra simply, and when it arrives on the different finish, it will likely be “deserialized” and the API will come again on the opposite system.

Knowledge flowing between techniques

That is the idea of transportable, interoperable knowledge. Again when it was known as Digital Knowledge Interchange (or EDI), it was a really huge deal. Then alongside got here the net after which XML after which JSON and now it’s only a regular a part of doing enterprise.

When you’re in Python and also you need to convert a dict to a flattened JSON string, you do the next:

import json

my_dict = {
    "title": "Mike",
    "age": 52,
    "metropolis": "New York"
}

json_string = json.dumps(my_dict)

…which might produce the next output:

'{"title": "Mike", "age": 52, "metropolis": "New York"}'

This seems nearly the identical as the unique dict, however if you happen to look carefully you may see that single-quotes are used across the total factor. One other apparent distinction is you could line-wrap actual structured knowledge for readability with none ailing impact. You may’t do it so simply with strings. That’s why it’s introduced all on one line within the above snippet.

Such stringifying processes are executed when passing knowledge between totally different techniques as a result of they aren’t all the time appropriate. Regular textual content strings alternatively are appropriate with nearly all the pieces and may be handed on web-requests with ease. Such flattened strings of JSON knowledge are ceaselessly known as the request.

Anatomy of a request

Once more, right here’s the instance request we made above:

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

Now that you simply perceive what the variable title json_string is telling you about its contents, you shouldn’t be stunned to see that is how we populate that variable:

 data_dict = {
    "goal": "moz.com/weblog",
    "scope": "web page",
    "restrict": 1
}

json_string = json.dumps(data_dict)

…and the contents of json_string seems like this:

'{"goal": "moz.com/weblog", "scope": "web page", "restrict": 1}'

That is considered one of my key discoveries in studying the Moz Hyperlinks API. That is in frequent with numerous different APIs on the market however journeys me up each time as a result of it’s a lot extra handy to work with structured dicts than flattened strings. Nevertheless, most APIs anticipate the information to be a string for portability between techniques, so now we have to transform it on the final second earlier than the precise API-call happens.

Pythonic masses and dumps

Now it’s possible you’ll be questioning in that above instance, what a dump is doing in the course of the code. The json.dumps() operate known as a “dumper” as a result of it takes a Python object and dumps it right into a string. The json.masses() operate known as a “loader” as a result of it takes a string and masses it right into a Python object.

The rationale for what seem like singular and plural choices are literally binary and string choices. In case your knowledge is binary, you employ json.load() and json.dump(). In case your knowledge is a string, you employ json.masses() and json.dumps(). The s stands for string. Leaving the s off means binary.

Don’t let anyone let you know Python is ideal. It’s simply that its tough edges should not excessively objectionable.

Project vs. equality

For these of you utterly new to Python or programming typically, what we’re doing once we hit the API known as an task. The results of requests.publish() is being assigned to the variable named response.

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

We’re utilizing the = signal to assign the worth of the fitting facet of the equation to the variable on the left facet of the equation. The variable response is now a reference to the item that was returned from the API. Project is totally different from equality. The == signal is used for equality.

# That is task:
a = 1  # a is now equal to 1

# That is equality:
a == 1  # True, however depends that the above line has been executed

The POST technique

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

The requests library has a operate known as publish() that takes 3 arguments. The primary argument is the URL of the endpoint. The second argument is the information to ship to the endpoint. The third argument is the authentication info to ship to the endpoint.

Key phrase parameters and their arguments

Chances are you’ll discover that among the arguments to the publish() operate have names. Names are set equal to values utilizing the = signal. Right here’s how Python capabilities get outlined. The primary argument is positional each as a result of it comes first and likewise as a result of there’s no key phrase. Keyworded arguments come after position-dependent arguments. Belief me, all of it is smart after some time. All of us begin to assume like Guido van Rossum.

def arbitrary_function(argument1, title=argument2):
    # do stuff

The title within the above instance known as a “key phrase” and the values that are available in on these places are known as “arguments”. Now arguments are assigned to variable names proper within the operate definition, so you may check with both argument1 or argument2 wherever inside this operate. When you’d prefer to study extra concerning the guidelines of Python capabilities, you may examine them right here.

Organising the request

Okay, so let’s allow you to do all the pieces essential for that success assured second. We’ve been exhibiting the essential request:

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

…however we haven’t proven all the pieces that goes into it. Let’s do this now. When you’re following alongside and don’t have the requests library put in, you are able to do so with the next command from the identical terminal setting from which you run Python:

pip set up requests

Typically instances Jupyter can have the requests library put in already, however in case it doesn’t, you may set up it with the next command from inside a Pocket book cell:

!pip set up requests

And now we will put all of it collectively. There’s only some issues right here which can be new. A very powerful is how we’re taking 2 totally different variables and mixing them right into a single variable known as AUTH_TUPLE. You’ll have to get your individual ACCESSID and SECRETKEY from the Moz.com web site.

The API expects these two values to be handed as a Python knowledge construction known as a tuple. A tuple is a listing of values that don’t change. I discover it fascinating that requests.publish() expects flattened strings for the knowledge parameter, however expects a tuple for the auth parameter. I suppose it is smart, however these are the refined issues to know when working with APIs.

Right here’s the complete code:

import json
import pprint
import requests

# Set Constants
ACCESSID = "mozscape-1234567890"  # Change along with your entry ID
SECRETKEY = "1234567890abcdef1234567890abcdef"  # Change along with your secret key
AUTH_TUPLE = (ACCESSID, SECRETKEY)

# Set Variables
endpoint = "https://lsapi.seomoz.com/v2/anchor_text"
data_dict = {"goal": "moz.com/weblog", "scope": "web page", "restrict": 1}
json_string = json.dumps(data_dict)

# Make the Request
response = requests.publish(endpoint, knowledge=json_string, auth=AUTH_TUPLE)

# Print the Response
pprint(response.json())

…which outputs:

{'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==',
 'outcomes': [{'anchor_text': 'moz',
              'external_pages': 7162,
              'external_root_domains': 2026}]}

Utilizing all higher case for the AUTH_TUPLE variable is a conference many use in Python to point that the variable is a continuing. It’s not a requirement, but it surely’s a good suggestion to comply with conventions when you may.

Chances are you’ll discover that I didn’t use all uppercase for the endpoint variable. That’s as a result of the anchor_text endpoint just isn’t a relentless. There are a variety of various endpoints that may take its place relying on what kind of lookup we wished to do. The alternatives are:

  1. anchor_text

  2. final_redirect

  3. global_top_pages

  4. global_top_root_domains

  5. index_metadata

  6. link_intersect

  7. link_status

  8. linking_root_domains

  9. hyperlinks

  10. top_pages

  11. url_metrics

  12. usage_data

And that leads into the Jupyter Pocket book that I ready on this matter positioned right here on Github. With this Pocket book you may prolong the instance I gave right here to any of the 12 out there endpoints to create quite a lot of helpful deliverables, which would be the topic of articles to comply with.

LEAVE A REPLY

Please enter your comment!
Please enter your name here