Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5,033 changes: 5,033 additions & 0 deletions Edited_Scripts/BTTF_edited.txt

Large diffs are not rendered by default.

6,757 changes: 6,757 additions & 0 deletions Edited_Scripts/FindingNemo_edited.txt

Large diffs are not rendered by default.

5,407 changes: 5,407 additions & 0 deletions Edited_Scripts/ForrestGump_edited.txt

Large diffs are not rendered by default.

4,052 changes: 4,052 additions & 0 deletions Edited_Scripts/Frozen_edited.txt

Large diffs are not rendered by default.

4,977 changes: 4,977 additions & 0 deletions Edited_Scripts/HSM_edited.txt

Large diffs are not rendered by default.

4,319 changes: 4,319 additions & 0 deletions Edited_Scripts/HarryPotter1_edited.txt

Large diffs are not rendered by default.

3,765 changes: 3,765 additions & 0 deletions Edited_Scripts/JurassicPark_edited.txt

Large diffs are not rendered by default.

2,379 changes: 2,379 additions & 0 deletions Edited_Scripts/KillBill1_edited.txt

Large diffs are not rendered by default.

5,836 changes: 5,836 additions & 0 deletions Edited_Scripts/LOTR1_edited.txt

Large diffs are not rendered by default.

2,658 changes: 2,658 additions & 0 deletions Edited_Scripts/Mulan_edited.txt

Large diffs are not rendered by default.

5,729 changes: 5,729 additions & 0 deletions Edited_Scripts/Rocky_edited.txt

Large diffs are not rendered by default.

4,603 changes: 4,603 additions & 0 deletions Edited_Scripts/StarWars_edited.txt

Large diffs are not rendered by default.

4,817 changes: 4,817 additions & 0 deletions Edited_Scripts/TheGodfather_edited.txt

Large diffs are not rendered by default.

4,436 changes: 4,436 additions & 0 deletions Edited_Scripts/TheMatrix_edited.txt

Large diffs are not rendered by default.

3,243 changes: 3,243 additions & 0 deletions Edited_Scripts/Up_edited.txt

Large diffs are not rendered by default.

Binary file added Final_Word_Clouds/BTTF.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/FindingNemo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/ForrestGump.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/Frozen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/HSM.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/HarryPotter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/JurassicPark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/KillBill.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/LOTR.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/Mulan.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/Rocky.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/StarWars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/TheGodfather.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/TheMatrix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final_Word_Clouds/Up.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/BTTF.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/FindingNemo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/ForrestGump.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/Frozen.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/HSM.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/HarryPotter1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/JurassicPark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/KillBill1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/LOTR1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/Mulan.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/Rocky.jpg
Binary file added Images/StarWars.jpg
Binary file added Images/TheGodfather.jpg
Binary file added Images/TheMatrix.jpg
Binary file added Images/Up.jpg
6,562 changes: 6,562 additions & 0 deletions Original_Scripts/BTTF.srt

Large diffs are not rendered by default.

8,694 changes: 8,694 additions & 0 deletions Original_Scripts/FindingNemo.srt

Large diffs are not rendered by default.

6,954 changes: 6,954 additions & 0 deletions Original_Scripts/ForrestGump.srt

Large diffs are not rendered by default.

5,396 changes: 5,396 additions & 0 deletions Original_Scripts/Frozen.srt

Large diffs are not rendered by default.

6,385 changes: 6,385 additions & 0 deletions Original_Scripts/HSM.srt

Large diffs are not rendered by default.

5,548 changes: 5,548 additions & 0 deletions Original_Scripts/HarryPotter1.srt

Large diffs are not rendered by default.

4,847 changes: 4,847 additions & 0 deletions Original_Scripts/JurassicPark.srt

Large diffs are not rendered by default.

3,088 changes: 3,088 additions & 0 deletions Original_Scripts/KillBill1.srt

Large diffs are not rendered by default.

7,619 changes: 7,619 additions & 0 deletions Original_Scripts/LOTR1.srt

Large diffs are not rendered by default.

3,436 changes: 3,436 additions & 0 deletions Original_Scripts/Mulan.srt

Large diffs are not rendered by default.

7,338 changes: 7,338 additions & 0 deletions Original_Scripts/Rocky.srt

Large diffs are not rendered by default.

5,952 changes: 5,952 additions & 0 deletions Original_Scripts/StarWars.srt

Large diffs are not rendered by default.

6,138 changes: 6,138 additions & 0 deletions Original_Scripts/TheGodfather.srt

Large diffs are not rendered by default.

5,779 changes: 5,779 additions & 0 deletions Original_Scripts/TheMatrix.srt

Large diffs are not rendered by default.

4,179 changes: 4,179 additions & 0 deletions Original_Scripts/Up.srt

Large diffs are not rendered by default.

Binary file added TM_ProjectWriteUp.pdf
Binary file not shown.
109 changes: 109 additions & 0 deletions movie_subtitles.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import re
import pickle
import string
from string import digits
from nltk.corpus import stopwords

"""
Parse through already downloaded movie scripts/subtitles to reduce to only words
"""

def edited_file_name(movie_script):
"""
Ensure new movie file name is clear

>>> edited_file_name('StarWars.srt')
StarWars_edited.txt

>>> edited_file_name('BTTF.srt')
BTTF_edited.txt
"""

movie_clear = movie_script.replace('.srt', '')
return '%s' % movie_clear + '_edited.txt'

def edit_all(movie_list):
"""
Edit all movie names and all movie subtitles in movie_list
Increase efficiency
"""

for movie in movie_list:
edit_script(movie)
new_name = edited_file_name(movie)
hist = process_script(new_name)

def edit_script(movie_script):
"""
Edit original movie subtitles to extract any unnecessary text
Save as new file
"""

new_name = edited_file_name(movie_script)

original_script = file(movie_script) # name variable to original version of movie script

new_script = open(new_name, 'w') # open new file for writing
for line in original_script: # read through original script
line = line.translate(None, digits) # remove numbers (0123456789)
line = re.sub('<.*?>', '', line) # remove any text between '<>', including the symbols themselves (if text on screen)


if '-->' not in line and 'Subtitle' not in line and '()' not in line and '^' not in line: # further parse to reduce to scripted words only
new_script.write(line) # write edited lines onto new file
new_script.close() # close file

def process_script(file_name):
"""
Opens edited script for further parsing
"""

hist = dict()
f1_script = open(file_name)
for line in f1_script:
process_line(line, hist)
return hist

def process_line(line, hist):
"""
Reads words in edited script to further extract unnecessary characters
"""

stop = set(stopwords.words('english')) # sets up to remove most common boring words (ex: 'the', 'at', 'me', etc)
line = line.replace('-', ' ') # replace hyphens with spaces

for word in line.split():
word = word.strip(string.punctuation + string.whitespace) # remove punctuation and redundant whitespace
word = word.lower() # make all letters lowercase to avoid technical difficulties
if word not in stop: # filter out stopwords
return word
else:
pass

hist[word] = hist.get(word, 0) + 1 # observe frequency of words

# def most_common(hist): # sorts hist by frequency of words (largest to smallest) instead of by word itself
# t = []
# for key,value in hist.items():
# t.append((value,key))

# t.sort(reverse = True)
# return t

# def print_most_common(hist, num=10): # prints out most common words
# t = most_common(hist)
# print 'The most common words are:'
# for freq,word in t[:num]:
# print word, '\t', freq



#edit_script('StarWars.srt')
#hist = process_script('blahblah.txt')
#print_most_common(hist, 30)


movies1 = ['StarWars.srt', 'TheGodfather.srt', 'TheMatrix.srt', 'Rocky.srt', 'JurassicPark.srt', 'KillBill1.srt', 'LOTR1.srt', 'ForrestGump.srt']
movies2 = ['Frozen.srt', 'HSM.srt', 'Mulan.srt', 'HarryPotter1.srt', 'FindingNemo.srt', 'Up.srt', 'BTTF.srt']
edit_all(movies1)
edit_all(movies2)
66 changes: 66 additions & 0 deletions word_cloud_2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""
Constructs word cloud from edited movie scripts/subtitles
"""

from os import path
import matplotlib.pyplot as plt
from movie_subtitles import edited_file_name, movies1, movies2

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from PIL import Image

from scipy.misc import imread

def make_RD_WC(movie_script, image_name):
"""
Construct rough draft of word cloud
"""

d = path.dirname(__file__)

text = open(path.join(d, movie_script)).read() # open and read movie script for most frequent words excluding stopwords

color_WC = imread(path.join(d, image_name)) # open and read image shape/color
wordcloud = WordCloud(background_color="white", max_words=2000, mask=color_WC, max_font_size=40, random_state=42)

wordcloud.generate(text) # generate word cloud

image_colors = ImageColorGenerator(color_WC) # specify font colors based on image colors

plt.imshow(wordcloud.recolor(color_func=image_colors)) # recolor word cloud from default colors to image colors
plt.axis("off") # hide axis (numbers on x and y axis)
plt.show() # show final word clouds

def RD_WC_all(movie_list, image_list):
"""
Parse through all movie files and all image files to run code for whole list instead of one at a time
Increase efficiency
"""

for movie in movie_list:
i = movie_list.index(movie) # retrieve index from movie list to retrieve correct image
image = image_list[i] # retrieve corresponding image
RD_WC = make_RD_WC(movie, image) # call word cloud generator

def edit_all_names(movie_list):
"""
Create new list of edited movie file names to load updated files into word cloud generator

>>> edit_all_names(['StarWars.srt', 'BTTF.srt'])
['StarWars_edited.txt', 'BTTF_edited.txt']
"""

movies_edited = []
for movie in movie_list:
m = edited_file_name(movie)
movies_edited.append(m)
return movies_edited

movies_edited_1 = edit_all_names(movies1)
movies_edited_2 = edit_all_names(movies2)

images1 = ['StarWars.jpg', 'TheGodfather.jpg', 'TheMatrix.jpg', 'Rocky.jpg', 'JurassicPark.jpg', 'KillBill1.png', 'LOTR1.jpg', 'ForrestGump.jpg']
images2 = ['Frozen.jpg', 'HSM.jpg', 'Mulan.jpg', 'HarryPotter1.jpg', 'FindingNemo.jpg', 'Up.jpg', 'BTTF.jpg']

RD_WC_all(movies_edited_1, images1)
RD_WC_all(movies_edited_2, images2)