Skip to main content

Caps Cleaner Devlog - 003

 Caps Cleaner Devlog - 003

By A.A. Lopez

    Haha! Yes, we're working with gas now! I've just finished the script for a basic version that takes multiple vtt files and updates them following the rules established, as you can see in Devlog 002. Now there were some minor issues that I came across and found perfect solutions, currently.

Issue 1 - Punctuation

    This hurdle was easy to overcome using substitution function that forces the letter that follows an ending punctuation to become capitalized. I also added in the function that if the letter is "I" by itself or if there is ('m|'ll|'d|'ve) following it, the letter will be capitalized as well. Though in here that makes the very first letter of the caption file not be capitalized, so had to add in that if it is the very first letter in the caption file it will be capitalized as well.

Issue 2 - Text Blocks & Formatting

    Here the problems got a bit more dicey, since not all sentences fit on a single line, they have to be broken up, but they can't all start with capital letters. So in the similar fashion as the punctuation unless the preceding line ends in an ending punctuation mark the first letter of the text block that follows will be lowercase.

    Now within those parameters another issue rises: Multiple speakers. And with how caption regulations are standardized most speakers are denoted by IDs like [Name], or by speaker change dashes (-), and other factors, but I won't include those here for brevity sake. So by taking all the information we gathered from the other issues I set up a stripping function that rebuilds the sentence and adjusts based on the rulesets. Now if that sounds like a copout way to finish this thought, it is, it's late and I want to go to bed, so I'm just going through my last issue and greatest achievement.

Issue 3 - Proper Nouns & Names

    My nemesis throughout this whole ordeal of learning Python, making an app, and everything, Proper Nouns. Names are everywhere, and they are some of the hardest things to account for when making an app that focuses on changing AllCaps to MixedCase while following basic grammar rules. So what to do?

  • Option 1 - Just have a growing list in the code of all proper names and nouns that has to be consistently updated? - No, too cumbersome
  • Option 2 - Create a separate file that gets updated when people add new names through the app? - Intermediate, and useful, but takes too long and always a chance that one or more might get missed.
  • Option 3 - Scrape the web? - Ding, ding, ding! Especially for the work I do.

    I'm not going to take all the credit for this one, a coworker mentioned it, and I just decided to push through with it. But in our job we need to grab the IMDb ID (the little tt# in the url) for each film and series we do as a way to track and be in compliance with contracts, so if we already need it, why not use it to our advantage?

    So by using the wonderful module BeautifulSoup I was able to cobble together a halfway decent webscraper, however, IMDb has alternating code to obfuscate folks trying to scrape the site for info. Which I honestly applaud, it's always good to be wary. And as a way to keep some secrets close to the chest I was able to find another route to get the info I needed from the site. And now when the names are needed to add to the list of proper nouns all you need to do is type/paste in the ID and press a button and they automatically are added to a separate file that gets updated (see I combined options 2 & 3).


    During this past week, with everything going on, having something like this to have as a distraction is really helpful. We'll see how useful it is once I get some of the more fun stuff implemented into it. I think I should start looking into creating a proper plan. But for now that's this devlog complete, see you guys next time.

Comments

Popular posts from this blog

Start of Week - Update 009

 Start of Week - Update 009 By A.A. Lopez     Let's see, it's been... 2 weeks since the last update. Not bad for me; usually, the break is 3. But let's see where we're at, what we're trying to do, and if there's anything upcoming for this week.     Honestly, I've put a lot of my projects off to the side for the past week or so due to the RTO order that came in a few weeks ago, along with layoffs looming overhead. So I've just been letting myself have the time to breathe and have room. Though I am keeping up with streaming every Friday at 7 PM, with different ideas for each stream. Last week's was focused on creating vector art with a real-life photo; it came out incredibly creepy, so I deleted it after the stream.     But I am feeling the need to get back to at least trying to focus on my creative projects again. I can't let these things just go to the wayside. I want to do them all, I just really need to find a way to organize myself better. At ...

End of Week Update - 008

End of Week Update - 008 By A.A.Lopez     You know, I tend to start each of these EoW updates with "Oh, I'm so tired," or "That was a long week." Just things to those effects, but seriously, I feel like this past week was much too much. Not going to get too much into it cause a lot of it deals with NDA stuff, but gods, I need a way off the rollercoaster.     But let's not focus too much on the week and focus on the accomplishments that happened!     Biggest thing, had my first stream this past Friday, as well as a second one on Saturday. You can see that it resulted in two art pieces, "pieces", they were sketches, and the stream got away from me a bit. Yet, I think I learned a lot from my time doing it. From how to use OBS to trying to be a good host. Was I excellent? No. Did I have a lot of viewers? No. Was it fun and worth leaning into and learning? Yes. So I think I will continue on this path for just a bit longer.     Of course, I should also ...

First output from the stream