I find it curious and even refreshing that in most cases the top nouns used by our dear heroes are mentions of the members of the crew. The result is a Doc object, an object that holds the processed text. To process a piece of text in spaCy, first, we need to load our language model, followed by calling the model on a text corpus. For instance, the verbs “talks,” “talked,” and “talking” are forms of the same lexeme, and its lemma is “talk”. Also, I’m using only the lemma, that’s it the canonical form, of each word. Moreover, as part of the spaCy data processing step, I’m ignoring the terms that are labeled as stop words, in other words, the commonly used words, e.g. “”, as well as the name of the character who says the line (actually, the name was used to know who said what, but not as part of the actual corpus used for the analysis). Thus, I removed some unnecessary things such as the comments that describe an action, or scene e.g. However, before using the data, I had to clean it up.
The data or text corpus - as is usually known in NLP - used for the experiment is the script of the movie, available at this link.
#W in avengers font code#
So, feel free to ignore the pieces of code :) In this article, I will discuss and show my findings while explaining with code how I did it with spaCy.Īren’t you interested in code and technical words? Today is your lucky day! I want to say that the vocabulary and terms I’ll use here are mostly non-technical and user-friendly so even if you have no experience in NLP, AI, machine learning or *insert buzzword here*, you should be able to grasp the main idea and concepts I want to inform. The similarity between the spoken lines of each character pair, e.g., the similarity between Thor’s and Thanos’ lines.Top verbs and nouns spoke by a particular character.Overall top 10 verbs, nouns, adverbs and adjectives from the film.
#W in avengers font movie#
Usin g spaCy, an NLP Python open source library designed to help us process and understand volumes of text, I analyzed the script of the movie to investigate the following concepts: The answer? Natural Language Processing, or NLP for short. And, since I am a data guy, of course, it had to involve data and a couple of buzzwords. To calm down my nerves and ease the wait, I wanted to relive the previous movie, Infinity War, but differently and interactively. I, like you, and most of the world will be rushing to the cinemas on day one to catch the movie and experience how the Avengers save the world and end a ten years story. In this movie, the Avengers team up once more and with the help of their allies they try to restore the balance in the universe and also reverse the damages that Thanos’ has caused.After a long year of waiting, Avengers: Endgame is finally here. With an approval rating 94% and an average score of 8.29/10 on the Rotten Tomatoes website, the film was said to be entertaining and exciting. This film grossed over $2.79 billion in total and by breaking box office records became the highest-grossing film of all time. The film was released on April 26, 2019, in both IMAX and 3D formats and it was highly praised for the direction, acting, visual effects, and musical score. Robert Downey Jr., Mark Ruffalo, Chris Hemsworth, Scarlett Johansson, Jeremy Renner, Don Cheadle, Paul Rudd, Brie Larson, Danai Gurira, Benedict Wong, Jon Favreau, Bradley Cooper, and Josh Brolin are among the ensemble cast of this movie.įilming of this movie began in 2017 and the production budget for the film was $356 million which made Avengers: Endgame one of the most expensive movies ever made. This movie is the sequel to The Avengers that was released in 2012, Avengers: Age of Ultron released in 2015 and also Avengers: Infinity War released in 2018.Īvengers: Endgame is directed by Anthony and Joe Russo based on a screenplay written by Christopher Markus and Stephen McFeely. Avengers: Endgame is a superhero movie produced by Marvel Studios based on the Marvel Comics superhero team, the Avengers.