ITI0011:Twitter homework

Allikas: Kursused
Redaktsioon seisuga 6. oktoober 2014, kell 08:03 kasutajalt Ago (arutelu | kaastöö) (→‎General)
(erin) ←Vanem redaktsioon | Viimane redaktsiooni (erin) | Uuem redaktsioon→ (erin)
Mine navigeerimisribale Mine otsikasti

Back to course web page

This is English version of second homework. Estonian version is here: ITI0011:Säuts.

General

Deadline: 21. or 23. October (depending on your practice time)
Defending your homework one week before (or earlier) will give you +1 point.

General reuirements:

  • Proper exception handling procedures – stacktrace errors must never appear during the execution time of your program.
  • Every object, its variable, or method should be commented using the Javadoc-style comments. Failure to comply with this requirement will lead to the loss of 1 point.

The goal of the homework is to create an interactive program capable of handling tweets using Twitter API. The basic part will give you 5 points, implementing some additional functionality will give you +1 point each, up to the maximal amount of 11 points this time. The basic part is an absolute minimal requirement that needs to be fulfilled in order to pass the assessment of this homework.

The required functionality:

  • Program accepts command-line arguments and everything is controllable from the arguments (example: java Twitter -location Tallinn -count 40 -sort date desc)
  • If the program is executed without arguments, an interactive command-line is executed, where the use can write commands ("> query Tallinn" or "> sort date desc")
  • Program has a proper manual (if some commands are written wrong or some parameters are missing, the help text should be shown; also "java Twitter --help" for example should print out the manual).
  • The program accepts a location and find last public tweets for that location and outputs those.
  • The number of tweets requested can be changed.
  • Downloaded tweets can be sorted and searched for.

Requirements are written in detail below. The list above is a general overview of the program (there are some more functionality features which need to be implemented).

Extra task: testable code (1p)

This hometask was created following the principle of providing students with all possible flexibility and possibilities for creativity. At the same time, it restricts possibilities to test your program using automated tests. Here we offer you an opportunity to earn additional bonus points (still the maximal possible amount of points, counting all possible bonus points is 11) if you make your code testable. To accomplish this task we provide you with a set of predefined interfaces, which you have to implement in your solution. Interfaces enable us to describe in a formal way which methods must be present in your solution, which arguments must these methods take and what should be the return value. Provided the implementation of these interfaces it becomes possible to call these methods using various input parameters and test it in a semi-automated way.

Attention! If you have already done with your solution (or at least some part of it) and wish to accomplish this extra task – this might mean that you possibly need to re-implement some major part in your solution. Thus, if you aim at implementing this extra part as well, you are expected to plan the structure of your program beforehand to comply with these requirements. As this extra task requires some extra effort from your side, we offer 1 bonus point for the accomplishment of this task.

More information: ITI0011:Twitter testable code.

Main part - 5p

The program makes a request to Twitter API to search public tweets. Last public tweets are downloaded in the location specified by the user from command line. The tweets are presented to the user.

Note that from Twitter API you just download the tweets (nothing special needs to be done to get the latest tweets - this is the default behavior). Also, when present tweets to the user, just print them out in the same order you receive.

Required functionality:

  • Read the location from command line (ex. "Tallinn")
  • For the location, find the geographical coordinates (latitude and longitude) using OpenStreetMap API
  • Use the bounding box information to calculate appropriate radius.
  • Send the coordinates and the radius to Twitter API
  • Read out the response from Twitter API into objects.
  • Print out the tweets.


Location coordinates

You can use OpenStreetMap community tool named Nominatim (Nominatim wiki). Given a location name it will return information about this location (coordinates, bounding box and other stuff).

Example request: http://nominatim.openstreetmap.org/search?q=Tallinn&format=xml

You will get a response (only partially shown):

<searchresults timestamp="Sat, 13 Sep 14 21:47:21 +0000" 
attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright" 
querystring="Tallinn" polygon="false" 
exclude_place_ids="98174326,11438224,6000303521,6919504,6893196,86869124,15103978,5983246058" 
more_url="http://nominatim.openstreetmap.org/search?format=xml&
exclude_place_ids=98174326,11438224,6000303521,6919504,6893196,86869124,15103978,5983246058&accept-language=en-
US,en;q=0.5&q=Tallinn">
<place place_id="98174326" osm_type="relation" osm_id="2164745" place_rank="16" 
boundingbox="59.351806640625,59.5915794372559,24.5501689910889,24.9262847900391" lat="59.4372155" 
lon="24.7453688" display_name="Tallinn, Harju maakond, Estonia" class="place" type="city" 
importance="0.7819722223575" icon="http://nominatim.openstreetmap.org/images/mapicons/poi_place_city.p.20.png"/>
...
</searchresults>

In the results you will see several places with the name Tallinn (or the name includes Tallinn). For the given homework, the first result is what you need to look for. So, you have to read first "place" element. For the Twitter query you need location (latitude and longitude) and radius. From the location search result, you should look for attributes "lat" and "lon" to be used as the center of the Twitter search. For the radius, use "boungindbox" attribute, which gives you the bounding box around the location. In the given example, you should look for lat="59.4372155", lon="24.7453688", boundingbox="59.351806640625,59.5915794372559,24.5501689910889,24.9262847900391".

For the radius, you could just find the distance between latitude and longitude. Beware that in different locations on Earth 0.1 longitude difference has difference distance. For this homework, the radius calculated should not be very accurate. But it still should vary depending on the size of the city (New York > .. > Tallinn > Haapsalu). Don't waste too much time for radius calculation, this won't give you any extra points, if you calculate it with 1m accuracy.

In short, you need to make a query, read the response, and translate the response into center coordinates and a radius.

Note: You could use some other service to get city coordinates (for example Google Maps API).

Twitteri API

Twitter API (https://dev.twitter.com/docs/api/1.1) allows a program to do automatic queries to social network Twitter. For this homwork, you don't need to be an active Twitter user. But you still need an account to make public queries.

To use Twitter API, you need to have a Twitter account and register an App (application) under that account. If you have Twitter account (after registering), you should see your application here: https://apps.twitter.com/ (The same link is on the page dev.twitter.com - on the bottom of the page there is "TOOLS" list). On the application page, you should create a new application. When creating an app, you can provide whatever web page link you want (for example course web page).

After creating an application, you will see it on the application page. If you open your application (from the list) and open "API keys" tab, you will see "API key" and "API secret". You need those values to make queries to Twitter API.

From API, we will be using search query which is described here: https://dev.twitter.com/rest/public/search and https://dev.twitter.com/rest/reference/get/search/tweets

An example query to get Tallinn tweets within 1km radius: https://api.twitter.com/1.1/search/tweets.json?q=&geocode=59.4372155,24.7453688,1km&result_type=recent

The given link does not work in the browser, because you are not authenticated properly.

Some information about Twitter authentication can be seen here: https://dev.twitter.com/oauth

Doing all this authentication and connection manually is a lot of work. It is recommended to use a library which does most of the work for you. We recommend to use http://twitter4j.org/ . This helps you to do authentication and queries more easily. If you want, you can use some other library.

To use twitter4j you need to get the file twitter4j-core-4.0.2.jar. If you download the zip-file, the jar file is located under the folder "lib". When writing this assignment, 4.0.2 was the latest version. If the version is newer, then the file name changes accordingly. You need to add this jar-file into your project: project properties > java built path > libraries > add external jars .. and browse the jar-file.

To configure the twitter4j, you need the following code:

		ConfigurationBuilder cb = new ConfigurationBuilder();
		cb.setDebugEnabled(true)
		.setApplicationOnlyAuthEnabled(true);
		cb.setOAuthConsumerKey(TWITTER_CUSTOMER_KEY)
		  .setOAuthConsumerSecret(TWITTER_CUSTOMER_SECRET);
		
		TwitterFactory tf = new TwitterFactory(cb.build());
		twitter4j.Twitter twitter = tf.getInstance();
		
		OAuth2Token token;
		try {
			token = twitter.getOAuth2Token();
		} catch (TwitterException e1) {
			// TODO Auto-generated catch block
			e1.printStackTrace();
		}

In the code snippet above TWITTER_CUSTOMER_KEY and TWITTER_CUSTOMER_SECRET are constants and the values are taken from Twitter application web page (application key and application secret accordingly). Of course you can use configuration file or some other method to set the API keys (more about that you can read from the library's web page).

How to actually make the query, you need to find out yourself, but the web page has a lot of examples. I recommend also to check the project github page where you can find the source code with tests. If you look at the tests, you can find different usages of the library.

If you have a Twitter account, you could use that account to create an app. The general goal is to get public tweets, you could instead get the tweets of your friends. This does not give you extra points. Note that if you want to make a request for your friends' tweets, you need to add your account keys. More information can be found here: http://twitter4j.org/en/configuration.html . If you don't provide accessToken information (as in the example above), you can only get public tweets (which is OK for this homework).


Extra task: location buffering (2p)

The functionality required to be implemented in this task includes the following:

  • Results of location queries are buffered in a file (local cache) so that there is no need to query the web service, in case the result is present in the cache.
  • One should be able to edit the cache file manually (using some sort of an editor) and populate it with some additional entries (i.e. “home”, “TUT”, etc.)

You are expected to use the file named “kohad.csv” as the local cache. This file must reside in a path accessible by your program (e.g. project classpath) and should contain CSV-formatted rows of text:

   ametlik_nimi,latitude,longitude,raadius_km,alternatiivnenimi_1,..,alternatiivnenimi_N

Alternatively, it may contain rows where some of the fields (e.g. coordinates or radius) are empty:

   ametlik_nimi,,,,alternatiivnenimi_1,...,alternatiivnenimi_N

where:

  • ametlik_nimi is a name of the location of interest
  • alternatiivnimi1....alternatiivnimiN are names which the end-user might wish to insert. For instance, a user might wish to provide an alternative name “home” to a certain location. Still the search is done using the so-called official name of the location (the very first field)
  • lat, lon, radius_km have quite a straightforward meaning and may be absent

The program uses the local cache file in the following way:

  • Before querying the API the program first tries to locate the location of interest in the local cache file.
    • If the location was found and the position and radius data is present, this data is used in a query to Twitter API. No attempt to determine the geolocation is undertaken.
    • If the location was found, but the position and radius data is not present, then:
      • The program queries the geolocation API to get the position of the location of interest, extracts the coordinates from the response and computes radius out of it.
      • Populates the corresponding entry in the local cache file with the data obtained in the previous step. Next time the same query will be launched will not trigger the query to geolocation API any more.
    • If the location was not found, then the program is expected to make a new query regarding the location and to cache the data regarding the query in the local cache.

Examples: 1) kohad.csv:

tallinn,59.4,24.5,10,home,ttü

In case of querying "Tallinn" coordinates 59.4 and 24.5 and radius 10 km will be used for Twitter API. The same happens if "home" is queried.

2) kohad.csv

pärnu,,,,grandma,summerhouse

In case of querying "pärnu" or "grandma" or "summerhouse" then "pärnu" will be used for location search. Coordinates will be queried, radius will be calculated. As a result, the same row in the cache file should be filled with the coordinates and the radius, an example:

pärnu,58.3,24.5,5,grandma,summerhouse

3) kohad.csv

tallinn,59.4,24.5,10,home,ttü

In the case of querying "pärnu" a location search is done to get the coordinates, a radius will be calculated. As a result, the new row will be added to the cache file:

tallinn,59.4,24.5,10,home,ttü
pärnu,58.3,24.5,5

The solution to this extra task, provided that it is done entirely and correctly, gives you 2 points.

Extra task: Sorting (1p)

Tweets are presented in a sorted order. Sorting shall be done by one of the specified criteria: author, tweet creation date, tweet itself. It should be possible to sort items in ascending, as well as in descending order. Examples of program invocation parameters:

  • java Twitter Tallinn -sort author
  • java Twitter Tallinn -sort date
  • java Twitter Tallinn -sort date desc
  • java Twitter Tallinn -sort content

The solution to this task will give you 1 point, provided that it was done correctly and entirely.


Extra task: filtering (1p)

In addition to the location, it is possible to specify a search keyword and the size of the output (the amount of tweets to display). You should pass the amount of tweets along with your query, then filter the results of it and display only the ones matching the search keyword.

The solution to this task gives you 1 point.

Extra task: interactive shell mode and commands to control the execution flow of the program (1p)

This task is aiming at enabling the end user to launch the program in the so-called batch mode (by specifying parameters on the command line), or, alternatively, in the interactive shell mode (in this mode the user is expected to type commands in an interactive shell-like environment). Interactive mode assumes that a control command is inserted, followed by the immediate execution of it by the program. Afterwords the end-user is presented with a prompt waiting for the next input command.

In case the program was launched in the batch mode, it parses command line arguments, extracts the parameter values, executes its task and terminates. If no arguments were specified on the command line the program launches the interactive shell and executes user commands one by one.

Example execution from command line:

java Twitter Tallinn -count 50 -sort date desc -search tere

Example of interactive program:

   > setcount 50
   > query Tallinn
   > print
   > query Pärnu 10
   > sort date desc
   > print
   > search tere
   > print

The program must recognize at least the following commands:

  • Querying the Tweeter API (e.g. “query”). It should be able to pass along with the command the following parameters: the amount of tweets to display. Alternatively, you may come up with the solution, in which the amount of tweets to display is set with a separate command (e.g. “setcount 50”, in case the corresponding extra task has been accomplished.) which remains valid for all subsequent queries, until reset or unset completely.
  • [Only in interactive mode] Displaying results (e.g. “print”). It is acceptable if your program displays the results immediately after the query has been launched. However, with respect to other operations it would be convenient if you consider implementing the “print” functionality as a standalone command.
  • Sorting (e.g. “sort date desc”) in case the corresponding extra task has been accomplished.
  • Searching (e.g. “search hello”), in case the corresponding extra task has been accomplished.
  • Context help (e.g. “help”). Might the end-user provide an argument formatted in an invalid way or some non-existent command – in all these and similar cases your program is expected to display a context-based help message for the user.

Additional requirement: your program is expected to accomplish same tasks in the same way, independently of the way how this or that behavior is triggered (whether in an interactive shell or by providing command line arguments). In both cases they should be converted to a common form, followed by immediate execution. The idea behind this is to learn to design your program in a way, that would allow to specify operational parameters to it in various ways, and the way, in which these parameters are specified are should not affect the program execution flow in any way. It may come helpful if some day you might wish to add a possibility to process commands over some web-based API.

Despite that this extra task seems to be complex and effort demanding task, actually, it is not so difficult as it may seem. Either way you have to implement one of the two suggested operational modes: either interactive one, or the batch one. Otherwise you will not be able to claim your bonus points. If during the initial phases of planning you consider the possibility for your program to operate in two possible modes of operation (and the possibility to acquire commands from two different sources and subsequently execute them). Thus we strongly advice to thoroughly think through the design of your program and plan beforehand, before you actually start writing some code.

This extra task gives you 1 point.