====== File Management ====== * Guidelines that Chih-Horng Kuo (chk@gate.sinica.edu.tw) developed for our group members. Suggestions are welcome. * Related information: * [[tutorials:scientific_presentation|Scientific Presentations]] * [[tutorials:scientific_writing|Scientific Writing]] ===== Backup ===== * Critical! * File loss happens. Not __IF__, but __WHEN__. * Establish an automatic backup plan; manual backup plans are not practical * Mac: Use "Time Machine" to back up automatically to another hard drive * Linux: Use rsync scripts with a cron job ===== Working with collaborators ===== * Discuss and setup a workflow * Avoid emailing files as attachments; use a file server or file sharing service (e.g., Google Drive) instead. * Setting up a file server vs. using a commercial service * File server: * Pros: Full control, large storage, no vendor lock-in * Cons: Requires hardware, set up, and maintenance * File sharing service * Pros: Easy, low cost * Cons: Limited space, less control over structure, potential data leakage. ===== File directory structure ===== * Make it easy to identify the parts that need to be included in backup * Example for personal desktops/laptops * A few top level folders with clear naming for easy backup/restore (e.g., work_files, personal_files) * Example for shared servers * Depends on the research group, for example: data/, lab_doc/, project/, conference/ * Clear rules, communication, and enforcement are all very important ===== Naming and version control ===== * File names * Do: short and informative. Include information such as project id, keyword (e.g. manuscript, report, keyword of figure/table), version or date * Avoid: * Long names that are difficult to read (e.g., full title of the manuscript) * Short names that are not informative (e.g., manuscript.docx, figure1.ai) * Space or special characters (can cause problem across systems) * Names such as "xxx_final.docx", "xxx_final_revised.docx", "xxx_final_revised_typofixed.docx", etc * Versioning * Version number should be the last part of the file name; two digits should be sufficient * Optional: Add initials if multiple people are involved in the project * When in doubt, save as a new version * Recommended: In early versions of manuscript drafts, add a short section in the beginning of the document to explain the major changes made * Benefits * Write as much as possible without worrying if those parts will be kept in later versions * Throw away as much as needed to make a good story; you can always go back to previous versions to retrieve the deleted materials * Project progress report * In the beginning of the document, include: * Name (__who__ prepared this report?) * Date (__when__ was it prepared?) * Project id (__what__ is it about?) * Directory of relevant files (on lab servers for people to get raw data files if needed) * For long reports, prepare a short summary section * Branching and merging * When multiple people are involved, branches may be created for each person to work on a different part * It is important that everyone agree with the leader on when and how to merge the branches * Example * For project "agro38", start the main manuscript file as "agro38_ms_v01.docx" * ABC then saved the file as "agro38_ms_v02_ABC.docx" to work on Introduction * DEF creates "agro38_ms_v02_DEF.docx" to work on Materials and Methods * GHI creates "agro38_ms_v02_GHI.docx" to work on Figure Legend. * ABC being the project leader, should be responsible to set the deadline, collect the files, then merge and create "agro38_ms_v03.docx" as the starting point for the next iteration. * Figure/table files * Before finalizing the order, use names without fig/table number and with a keyword (e.g., "fig_phylogeny_v02.ai", "table_accession_v05.xlsx") * After finalizing the order, put all early versions into a separate folder (e.g., "figure_stage1"), then add figure/table numbers to the file names (e.g., "fig1_phylogeny_v15.ai") * If the order changed, put the previous versions into another separate folder (e.g., "figure_stage2"), then update the figure/table numbers (e.g., "fig3_phylogeny_v16.ai") * Very important to save the files in editable formats (e.g., .ai, not just .jpg) ===== Tracking changes ===== * For Word files: Use the build-in function "Track Changes" by default when collaborating; possible to use the "Compare Documents" functions later. * For Google Docs: All changes are automatically tracked. Manually download the major versions and save with meaningful names (e.g., "agro38_ms_v05_ABC.docx") ===== Metadata ===== * Metadata = data that explains your files: what they are, where they came from, how to use them. * Examples * For the raw data file of a gel image (20250509_1428.jpg), add a text file (20250509_1428.txt or 20250509_1428.docx) to explain what this image is (when, whom, which project, content of each lane, why generate this, main conclusion, etc) * For a set of raw data files (e.g., Sanger sequencing results), or a file folder, provide a "readme.txt" to explain the content