ScriptLineCounter



ScriptLineCounterVersion: 1.8.1.0
Release: v1.8.1.0, 2015-04-17Support: DonationCoder thread

Count all lines of some specifically formatted script files (rtf, doc, docx, txt, pdf, odt, ods, csv), and output all character names and the count of lines/dialogues spoken into an Excel file.

Usage and examples can be found in the readme file in the package.

What's new:

2015-04-17, v1.8.1.0Fixed: FileFormat5 didn't handle .csv files with semicolon separators, and intermediate output (debug-level 3) for this format lacked new-lines after each line.
2014-03-21, v1.8.0.0Added: FileFormat6, like FileFormat4 but now allows Character names of 1 or 2 words. This may give some mis-calculations, but is on special request.
Updated: Refreshed libraries POI v3.10-final and PdfBox v1.8.4
When Upgrading:
- Remove the lib subdirectory and replace all files
2013-06-21, v1.7.2.1Fixed: Input of Name and Text column name was not transferred correctly to the running instance from the GUI.
Improved: GUI layout was a bit stretched.
2013-06-19, v1.7.2.0Added: Fileformat 5 parameters -cn and -ct now also accept column-numbers instead of a name. First column is 1.
Improved: GUI now has fields for Name and Text columns for Fileformat 5. There also the column numbers can be entered.
2013-06-14, v1.7.1.0Added: Handling of doc/docx contents as csv (FileFormat 5), including specifying the name and text column titles (not settable from GUI), may need to explicitly set that file-format
Added: Option -ci <lines> for ignoring the first lines of a file, 'skiplines' in the ini file
Added: Option -cm <max> for setting the max. number of words for a character-name (default 4), 'maxcharacternameparts' in the ini file
These new options are also available from the GUI
Improved: If debuglevel 3 or greater is set (-3 or -4), the loaded file-content is saved as text to a new file with the same name and '.txt.tmp' appended.
Minor: Slightly adjusted some GUI screen-labels.
2013-03-09, v1.7.0.0Added: New FileFormat nr. 5, for reading .csv files, with a Name and Text column (columnnames can be configured)
Changed: Encoding (as used for PDF processing) can also be applied to .txt and .csv files, when set/changed from either the commandline or the GUI
When Upgrading:
- Remove the lib subdirectory and replace all files
v1.6.0.1Bugfix: Autodetect fileformat from GUI stopped working.
2012-11-18, v1.6.0.0Added: New FileFormat nr. 4, based on doc-files supplied by Saira, the original initiator of this tool.
note: This format is quite similar to FileFormat 2, so it may need to be forcibly used on some files/filesets.
Added: Setting the FileFormat from the commandline using -ff parameter, or set from the ini file, as documented in the readme.
Changed: GUI now has the Fileformat combo box enabled, to force a specific file format to be used.
Improved: Some more robustness while handling the file contents.
Added: A warning in the readme file to NOT use Windows Notepad for editing the properties files, as it inserts a BOM in UTF-8 files, not supported by SLC.
When Upgrading:
- Remove the lib subdirectory and replace all files
2012-05-07, v1.5.1.0Improved: Updated POI Library from 3.8-beta6 to 3.8 Release.
When Upgrading:
- Remove lib directory and replace all files during unpacking
2012-04-30, v1.5.0.0Changed: Replaced my own simple logging system by Log4J logging, as that is already required by some 3rd party libs
2012-04-09, v1.4.2.0Added: OpenOffice/LibreOffice .odt and .ods read capability (minimal support)
Added: More (optional) sheets if -xe specified: Name mappings and Ignored names
Improved: Extra info sheet now shows percentages with 1 decimal place, and has extra columns for text-lines found and lines recognized
Improved: Refactorings in code
2012-04-07, v1.4.1.0Fixed: Exception when generating output but no episodes where found (no files?)
Improved: If a .doc file is actually a disguised .rtf file, then read it like rtf, and same for .docx
Improved: Remove non-breaking spaces from character names, as found in some .doc files
Added: ScriptLineCounter.exe built using Launch4j, to avoid having a Command prompt open during run, also disables Console output
Added: Messagebox feature when running from .exe, messages to console shown as messagebox when needed
Added: ScriptLineCounter-CharacterMapping.properties settings file to merge multiple characters into 1, for resolving some typo's, supports Unicode
Added: ScriptLineCounter-CharacterNames.properties settings file to replace [CharacterNames] section in ini file, supports Unicode
Added: ScriptLineCounter-IgnoreCharacters.properties settings file to replace [IgnoreCharacters] section in ini file, supports Unicode
2012-04-01, v1.4.0.0Added: PDF read capability
Added: Parsing FileFormat 3, as supplied in pdf, nearly correct, to be validated
Added: -oc (OutputContent) option, no ini setting, displays all read file-content to console when -v also is set
Added: -im (IgnoreMinimalScore) option, no ini setting
Added: -xe option, Extra Info sheet to excel file, listing all files, the recognition percentage and the file-format detected
Improved: Overhauled GUI options and layout, added url-label with link to DC-forum thread
2012-03-26, v1.3.1.0Added: Second file-format supported, <Character><dash/:/(react)><dialogue>
Improved: Output to Excel formatting somewhat nicer for .xls files
Improved: Some issues with GUI mode resolved, progressbar works, filename label almost...
2012-03-21, v1.3.0.0Added: GUI mode, but just in beta mode for now (-gui), using Swing with MigLayout, still a lot to add and improve
Improved: Layout according to received sample
2012-03-18, v1.2.1.0
(never released)
Improved: Sort by Lines is now enabled by default
Improved: Totals column is now bold
Improved: Episode title is now right-aligned to better mate with episode titles
Improved: Less verbose if matchVerbose was added from .ini
Added: SortByName (-n) overrides -s
Added: Alternate grey/white background (-a), default enabled
Added: Zero counts are now suppressed by default, -z enables zeroes
Improved: Code changes/refactoring
2012-03-18, v1.2.0.1Fixed: a bug of not using correct case in character names
2012-03-17, v1.2.0.0Improved: Spoken-lines detection
Improved: Detection of Characters with accented (unicode) characters
Improved: Episode-number detection
Improved: Count line for all characters if multiple characters speak
Added: Sort by total lines-count instead of in order of appearance (default)
Added: Filter detected lines for a selection of characters to console (-m)
Added: [CharacterNames] section in ScriptLineCounter.ini, for mapping Characters to Actor names
2012-03-17, v1.1.0.1Bugfix to process .doc as intended.
Improved episode number/name detection method.
Added support for [IgnoreCharacters] section to ScriptLineCounter.ini to ignore specific names/false detections
2012-03-17, v1.1.0.0Process .doc/.docx files, .txt added as a bonus, column titles can be set in .ini file (sample included), improved errorhandling.
2012-03-16, v1.0.0.0Initial release, processes .rtf files only, output to .xls/.xlsx, command-line only.

  • Version
  • 24 Downloads
  • 24.40 MB File Size
  • April 17, 2015 Release Date
  • n/a Creation Date
  • n/a Requirements