Skip to content

adding extra utility and parsing abilities to the library #28

@binaryaaron

Description

@binaryaaron

I wanted to open this up for discussion.

The tweet parser as is has a ton of great functionality for working with tweets and makes a lot of tweet data readily accessible. However, there are other potential cases of working with tweet data that are not explicitly parsing predictable data from tweets.

For one example, if a person wants to grab tweets with ANY geolocation data and use it, they have to deal with both exact position (lat, long) and geojson place coordinates (bounding box and associated info). I had made some code to essentially unify the two methods to get data that was easily plottable, by creating a function that gets either precise position if it exists or a coordinate in the bounding box.

The code looks like this:

from functools import reduce

from tweet_parser.tweet_checking import is_original_format

try:
    import numpy as np
    mean_bbox = lambda x: list(np.array(x).mean(axis=0))
except ImportError:
    mean_bbox = lambda x: (reduce(lambda y, z: y + z, x) / len(x))

def get_profile_geo_coords(tweet):
    geo = tweet.profile_location.get("geo")
    coords = geo.get("coordinates") # in [LONG, LAT]
    if coords:
        long, lat = coords
    return lat, long


def get_place_coords(tweet, est_center=False):
    """
    Places are formal spots that define a bounding box around a place.
    Each coordinate pair in the bounding box is a set of [[lat, long], [lat, long]]
    pairs.
    
    """
    
    def get_bbox_ogformat():
        _place = tweet.get("place")
        if _place is None:
            return None
    
        return (_place
                .get("bounding_box")
                .get("coordinates")[0])

    def get_bbox_asformat():
        _place = tweet.get("location")
        if _place is None:
            return None
        return (_place
                .get("geo")
                .get("coordinates")[0])
        
    bbox = get_bbox_ogformat() if is_original_format(tweet) else get_bbox_asformat()

    return mean_bbox(bbox) if est_center else bbox


def get_exact_geo_coords(tweet):
    geo = tweet.get("geo")
    if geo is None:
        return None
    
    # coordinates.coordinates is [LONG, LAT]
    # geo.coordinates is [LAT, LONG]
    field = "geo" if is_original_format(tweet) else "geo"
    coords = tweet.get(field).get("coordinates")
    return coords


def get_a_geo_coordinate(tweet):
    """Returns a (lat, long) tuple that corresponds to a point within the bounding box of this tweet
    or the precise geolocation if it exists.
    """
    geo = get_exact_geo_coords(tweet)
    lat, long = geo if geo else (None, None)
    if lat:
        return lat, long
    long, lat = get_place_coords(tweet, est_center=True)
    return lat, long

Should we have an auxiliary module in here that allows for storing such code? I think it could be useful long-term in centralizing our efforts, sharing code, and helping end users get work done quickly. I am not at all opposed to putting this type of code elsewhere either.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions