//
you're reading...

Programming

Introduction to Cocoa: NSLinguisticTagger in NSBrief podcast #72

Stationary smallI’ve recorded an Introduction to Cocoa segment about NSLinguisticTagger in NSBrief Podcast episode #72. On the surface it is a four minute audio but the process of conceptualizing the idea, writing the script, and finally recording (with several takes) took about a day to complete.

These are the software that I used in the process

  • BombingBrain Interactive’s Teleprompt+ for Mac – for displaying the script to read and time it properly.
  • Ittiam Systems’ ClearRecord Premium for iPhone – so that the audio comes out relatively noise-free although I’ve recorded it in a relatively noisy apartment near to an intersection of two main streets.

For your benefit, I’ve included the script here and you can find my recording at the end of this post. 

Hi! I’m Sasmito Adibowo and in the next four minutes you’re going to learn about NSLinguisticTagger – what is it, when do you need it, and how to use it.

NSLinguisticTagger enables your OS X or iOS app to determine parts of speech in a body of text. NSLinguisticTagger can identify:

  • Tokens – word, punctuation, or whitespace
  • Script – the alphabet that comprise the text, whether Latin – or standard keyboard characters plus a few accented letters, Cyrillic – used in Russia and neighboring countries that were members of the former Soviet Union, Han traditional or simplified – which are used in Chinese languages like Mandarin and other dialects, and so on.
  • Lemma – that is the root form of a word, among which is the singular form of a plural word.
  • Lexical Class – whether a word is a noun, verb, adjective, adverb, et cetera.
  • Name Type – can be a place name, person name, or organization name.

Those are called “tag schemes” in NSLinguisticTagger's parlance and you can find out more about them in the API documentation.

You’ll find NSLinguisticTagger useful when you need to process natural language text. As far as I know, both OS X and iOS have good support for English whereas OS X has support for some other languages. To see what parts of speech that NSLinguisticTagger can identify for your favorite language, call the static method availableTagSchemesForLanguage and pass it the two-letter language code. It will return an array of strings saying what tag schemes it can parse for that language. Be sure to run the test separately on OS X and iOS.

One example that may call for NSLinguisticTagger is when you’re writing a note-taking app. You could try to recognize people’s names and company names in the text and then automatically tag the entry with those names found in the body text. You could also use NSLinguisticTagger to normalize your user’s tags and use lemmas as the canonical form for each tag – so that “people” and “person” aren’t considered as two different tags.

Another example if you’re writing an e-mail client or feed reading apps – Twitter or app.net apps comes to mind – your app can try to detect the language of the text that is being displayed and offers a machine translation if the text is not in the user’s preferred language. Of course NSLinguisticTagger won’t help you translate text between languages but it can help determine which language does the text belongs to.

NSLinguisticTagger also works great for making word clouds. You can use it to filter out noise like texts from other languages, remove un-interesting words, as well as keeping people’s names together so that it displays as one word in the resulting word cloud.

To use NSLinguisticTagger, you create an instance of it and specify what tag schemes that you want the instance to parse. Then call setString on it to provide the instance with the text that you want to process. Finally to start tagging the text, you call enumerateTagsInRange: scheme: options: usingBlock:. It will call your block repeatedly and pass it each token that it finds with its corresponding tag.

I’ve posted an example project that shows you how you can use NSLinguisticTagger. In short the app is a syntax highlighter for natural language. It takes in an English Language text and color the nouns and names in the text. Search for ColorizeWords in Github or take a look at the show notes.

That’s all I have now for NSLinguisticTagger. Happy tagging!

Here is my recording of the podcast – you can find the complete episode in NSBrief’s podcast page.

 

 



Do you enjoy this post? Enter your e-mail address below to receive articles like this one in your mailbox.
* indicates required

Discussion

No comments yet.

Leave a Reply

Free Updates!

Learn how to grow your indie business while keeping your day job.

Categories

Archives

Keep updated!

Don't miss out on new articles!