Monday, June 9, 2025

Caps Cleaner Devlog - 003

 Caps Cleaner Devlog - 003

By A.A. Lopez

    Haha! Yes, we're working with gas now! I've just finished the script for a basic version that takes multiple vtt files and updates them following the rules established, as you can see in Devlog 002. Now there were some minor issues that I came across and found perfect solutions, currently.

Issue 1 - Punctuation

    This hurdle was easy to overcome using substitution function that forces the letter that follows an ending punctuation to become capitalized. I also added in the function that if the letter is "I" by itself or if there is ('m|'ll|'d|'ve) following it, the letter will be capitalized as well. Though in here that makes the very first letter of the caption file not be capitalized, so had to add in that if it is the very first letter in the caption file it will be capitalized as well.

Issue 2 - Text Blocks & Formatting

    Here the problems got a bit more dicey, since not all sentences fit on a single line, they have to be broken up, but they can't all start with capital letters. So in the similar fashion as the punctuation unless the preceding line ends in an ending punctuation mark the first letter of the text block that follows will be lowercase.

    Now within those parameters another issue rises: Multiple speakers. And with how caption regulations are standardized most speakers are denoted by IDs like [Name], or by speaker change dashes (-), and other factors, but I won't include those here for brevity sake. So by taking all the information we gathered from the other issues I set up a stripping function that rebuilds the sentence and adjusts based on the rulesets. Now if that sounds like a copout way to finish this thought, it is, it's late and I want to go to bed, so I'm just going through my last issue and greatest achievement.

Issue 3 - Proper Nouns & Names

    My nemesis throughout this whole ordeal of learning Python, making an app, and everything, Proper Nouns. Names are everywhere, and they are some of the hardest things to account for when making an app that focuses on changing AllCaps to MixedCase while following basic grammar rules. So what to do?

  • Option 1 - Just have a growing list in the code of all proper names and nouns that has to be consistently updated? - No, too cumbersome
  • Option 2 - Create a separate file that gets updated when people add new names through the app? - Intermediate, and useful, but takes too long and always a chance that one or more might get missed.
  • Option 3 - Scrape the web? - Ding, ding, ding! Especially for the work I do.

    I'm not going to take all the credit for this one, a coworker mentioned it, and I just decided to push through with it. But in our job we need to grab the IMDb ID (the little tt# in the url) for each film and series we do as a way to track and be in compliance with contracts, so if we already need it, why not use it to our advantage?

    So by using the wonderful module BeautifulSoup I was able to cobble together a halfway decent webscraper, however, IMDb has alternating code to obfuscate folks trying to scrape the site for info. Which I honestly applaud, it's always good to be wary. And as a way to keep some secrets close to the chest I was able to find another route to get the info I needed from the site. And now when the names are needed to add to the list of proper nouns all you need to do is type/paste in the ID and press a button and they automatically are added to a separate file that gets updated (see I combined options 2 & 3).


    During this past week, with everything going on, having something like this to have as a distraction is really helpful. We'll see how useful it is once I get some of the more fun stuff implemented into it. I think I should start looking into creating a proper plan. But for now that's this devlog complete, see you guys next time.

No comments:

Post a Comment

End of Week Update - 008

End of Week Update - 008 By A.A.Lopez     You know, I tend to start each of these EoW updates with "Oh, I'm so tired," or ...