-
Notifications
You must be signed in to change notification settings - Fork 17
Description
I wanted to open this up for discussion.
The tweet parser as is has a ton of great functionality for working with tweets and makes a lot of tweet data readily accessible. However, there are other potential cases of working with tweet data that are not explicitly parsing predictable data from tweets.
For one example, if a person wants to grab tweets with ANY geolocation data and use it, they have to deal with both exact position (lat, long) and geojson place coordinates (bounding box and associated info). I had made some code to essentially unify the two methods to get data that was easily plottable, by creating a function that gets either precise position if it exists or a coordinate in the bounding box.
The code looks like this:
from functools import reduce
from tweet_parser.tweet_checking import is_original_format
try:
import numpy as np
mean_bbox = lambda x: list(np.array(x).mean(axis=0))
except ImportError:
mean_bbox = lambda x: (reduce(lambda y, z: y + z, x) / len(x))
def get_profile_geo_coords(tweet):
geo = tweet.profile_location.get("geo")
coords = geo.get("coordinates") # in [LONG, LAT]
if coords:
long, lat = coords
return lat, long
def get_place_coords(tweet, est_center=False):
"""
Places are formal spots that define a bounding box around a place.
Each coordinate pair in the bounding box is a set of [[lat, long], [lat, long]]
pairs.
"""
def get_bbox_ogformat():
_place = tweet.get("place")
if _place is None:
return None
return (_place
.get("bounding_box")
.get("coordinates")[0])
def get_bbox_asformat():
_place = tweet.get("location")
if _place is None:
return None
return (_place
.get("geo")
.get("coordinates")[0])
bbox = get_bbox_ogformat() if is_original_format(tweet) else get_bbox_asformat()
return mean_bbox(bbox) if est_center else bbox
def get_exact_geo_coords(tweet):
geo = tweet.get("geo")
if geo is None:
return None
# coordinates.coordinates is [LONG, LAT]
# geo.coordinates is [LAT, LONG]
field = "geo" if is_original_format(tweet) else "geo"
coords = tweet.get(field).get("coordinates")
return coords
def get_a_geo_coordinate(tweet):
"""Returns a (lat, long) tuple that corresponds to a point within the bounding box of this tweet
or the precise geolocation if it exists.
"""
geo = get_exact_geo_coords(tweet)
lat, long = geo if geo else (None, None)
if lat:
return lat, long
long, lat = get_place_coords(tweet, est_center=True)
return lat, long
Should we have an auxiliary module in here that allows for storing such code? I think it could be useful long-term in centralizing our efforts, sharing code, and helping end users get work done quickly. I am not at all opposed to putting this type of code elsewhere either.