{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":612538130,"defaultBranch":"master","name":"rllama","ownerLogin":"Noeda","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-03-11T08:41:15.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/833719?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1680893931.0","currentOid":""},"activityList":{"items":[{"before":"059c948ade11b37d6b306561733ed5355426e0db","after":"1e1131faaaf7013ed19639ad96f252458efdb45b","ref":"refs/heads/master","pushedAt":"2023-04-09T00:49:44.000Z","pushType":"pr_merge","commitsCount":3,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Merge pull request #6 from DaniAsh551/master\n\nRevised nvidia docker support","shortMessageHtmlLink":"Merge pull request <a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"1659484473\" data-permission-text=\"Title is private\" data-url=\"https://github.com/Noeda/rllama/issues/6\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/Noeda/rllama/pull/6/hovercard\" href=\"https://github.com/Noeda/rllama/pull/6\">#6</a> from DaniAsh551/master"}},{"before":"8d6897ee70033a661c19cab2a2e9139f45ffb33e","after":"059c948ade11b37d6b306561733ed5355426e0db","ref":"refs/heads/master","pushedAt":"2023-04-07T18:59:30.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Name a CPU-only Dockerfile since the NVidia turned out to be complicated to make actually work due to NVidia's own shenanigans.\n\nIt seems to be difficult to get OpenCL powered by NVidia inside a\ncontainer. My Linux did not have the necessary packages in repositories\n(Fedora) to expose NVidia GPUs inside docker.","shortMessageHtmlLink":"Name a CPU-only Dockerfile since the NVidia turned out to be complica…"}},{"before":null,"after":"059c948ade11b37d6b306561733ed5355426e0db","ref":"refs/heads/pr-5","pushedAt":"2023-04-07T18:58:51.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Name a CPU-only Dockerfile since the NVidia turned out to be complicated to make actually work due to NVidia's own shenanigans.\n\nIt seems to be difficult to get OpenCL powered by NVidia inside a\ncontainer. My Linux did not have the necessary packages in repositories\n(Fedora) to expose NVidia GPUs inside docker.","shortMessageHtmlLink":"Name a CPU-only Dockerfile since the NVidia turned out to be complica…"}},{"before":"61b68c6e1c8c6428f3fdf0a28abb836cf3cebf67","after":"8d6897ee70033a661c19cab2a2e9139f45ffb33e","ref":"refs/heads/master","pushedAt":"2023-04-07T05:33:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Fix interactive system prompt overwriting --prompt.\n\nOops.","shortMessageHtmlLink":"Fix interactive system prompt overwriting --prompt."}},{"before":"7aa10512f96a70e77327fc5aba2349a476a15ce0","after":"61b68c6e1c8c6428f3fdf0a28abb836cf3cebf67","ref":"refs/heads/master","pushedAt":"2023-04-06T17:05:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Small fix to README.md on the Vicuna13B interactive thing.","shortMessageHtmlLink":"Small fix to README.md on the Vicuna13B interactive thing."}},{"before":"4faf07a8ae7c62fbdc0b97b9c78494eeabc366b0","after":"7aa10512f96a70e77327fc5aba2349a476a15ce0","ref":"refs/heads/master","pushedAt":"2023-04-06T17:01:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Adjust default interactive settings so they match what Vicuna-13B expects.","shortMessageHtmlLink":"Adjust default interactive settings so they match what Vicuna-13B exp…"}},{"before":"f5328ab5bd62fe9bd930539382b13e9033434a0b","after":"4faf07a8ae7c62fbdc0b97b9c78494eeabc366b0","ref":"refs/heads/master","pushedAt":"2023-04-06T00:18:56.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Update benchmark numbers in README.md after new OpenCL changes.","shortMessageHtmlLink":"Update benchmark numbers in README.md after new OpenCL changes."}},{"before":"746fc56c8b56142870a21ef58bb9a6ed287ab4c3","after":"f5328ab5bd62fe9bd930539382b13e9033434a0b","ref":"refs/heads/master","pushedAt":"2023-04-06T00:13:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Force f16 if OpenCL is on, because otherwise we will crash.","shortMessageHtmlLink":"Force f16 if OpenCL is on, because otherwise we will crash."}},{"before":"4a1e2950114eac7b96056bd75166959ef8d3c7af","after":"746fc56c8b56142870a21ef58bb9a6ed287ab4c3","ref":"refs/heads/master","pushedAt":"2023-04-05T23:23:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Write special OpenCL kernel when left side matrix has only 1 row.\n\nThis speeds up things quite a bit. I think I had ~570ms on Vicuna-13B\nbefore this and now around ~440ms.","shortMessageHtmlLink":"Write special OpenCL kernel when left side matrix has only 1 row."}},{"before":"e4af9d9d515cdc1ae4894094283a7e4e821eb667","after":"4a1e2950114eac7b96056bd75166959ef8d3c7af","ref":"refs/heads/master","pushedAt":"2023-04-05T03:18:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Update goals, as Stanford's model is no longer that relevant in the face of Vicuna-13B and Vicuna-13B works.","shortMessageHtmlLink":"Update goals, as Stanford's model is no longer that relevant in the f…"}},{"before":"35a3a5c32a3b2e6674b8795025168110854d2973","after":"e4af9d9d515cdc1ae4894094283a7e4e821eb667","ref":"refs/heads/master","pushedAt":"2023-04-05T01:24:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Add support to only load a model partially to GPU.\n\nThis was surprisingly easy to add.","shortMessageHtmlLink":"Add support to only load a model partially to GPU."}},{"before":"88cb87634ed0e7269ed13a0b885b4566a989c927","after":"35a3a5c32a3b2e6674b8795025168110854d2973","ref":"refs/heads/master","pushedAt":"2023-04-05T00:57:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Mention that Vicuna-13B works in the README.md","shortMessageHtmlLink":"Mention that Vicuna-13B works in the README.md"}},{"before":"e27e649d9458f82c3415d2b0e3eb4e5d0aef6816","after":"88cb87634ed0e7269ed13a0b885b4566a989c927","ref":"refs/heads/master","pushedAt":"2023-04-05T00:56:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Fix warnings.","shortMessageHtmlLink":"Fix warnings."}},{"before":"48244abe5cd02499c1a5121b9fa60daf5abb4e53","after":"e27e649d9458f82c3415d2b0e3eb4e5d0aef6816","ref":"refs/heads/master","pushedAt":"2023-04-05T00:33:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Add Huggingface model loading (Vicuna-13B) support.\n\nThis doesn't have nice UX at all for Vicuna-13B but I can confirm it\nworks if fed an appropriate prompt from command line.\n\nNext step is probably make a nice UX for the chatter.","shortMessageHtmlLink":"Add Huggingface model loading (Vicuna-13B) support."}},{"before":"4c99e7fe41fb9d12c1539f60d685a9f6baae100d","after":"48244abe5cd02499c1a5121b9fa60daf5abb4e53","ref":"refs/heads/master","pushedAt":"2023-04-02T18:21:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Fix some awkward wording in README.md","shortMessageHtmlLink":"Fix some awkward wording in README.md"}},{"before":"f6249e8d9fa1a0be7000cfb56d9ca73971bd2584","after":"4c99e7fe41fb9d12c1539f60d685a9f6baae100d","ref":"refs/heads/master","pushedAt":"2023-04-02T18:20:15.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Don't report interactive stop token sequence if we are not interactive mode.","shortMessageHtmlLink":"Don't report interactive stop token sequence if we are not interactiv…"}},{"before":"19e552e1ea7b73ecb3f828bf6bbf09e3e6a81f09","after":"4c99e7fe41fb9d12c1539f60d685a9f6baae100d","ref":"refs/heads/pr-3","pushedAt":"2023-04-02T18:20:08.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Don't report interactive stop token sequence if we are not interactive mode.","shortMessageHtmlLink":"Don't report interactive stop token sequence if we are not interactiv…"}},{"before":"39f69f24fbce103130de9fc5891538dba0369039","after":"19e552e1ea7b73ecb3f828bf6bbf09e3e6a81f09","ref":"refs/heads/pr-3","pushedAt":"2023-04-02T17:52:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Erase debug println!() I had.","shortMessageHtmlLink":"Erase debug println!() I had."}},{"before":null,"after":"39f69f24fbce103130de9fc5891538dba0369039","ref":"refs/heads/pr-3","pushedAt":"2023-04-02T17:51:32.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Document interactive mode, run rustfmt on rllama_main.rs","shortMessageHtmlLink":"Document interactive mode, run rustfmt on rllama_main.rs"}},{"before":"8cc82ae7e20256f432d0e0c16c9fe384dedafa8b","after":"d7d13cd474eb28faa3be0672232eb5387ff8f4ed","ref":"refs/heads/k4bit","pushedAt":"2023-03-23T09:11:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Bucketize the 4-bit quantization for more accuracy.\n\nCurrently going with 512 columns per bucket. Need to test a bit should I\ngo even smaller, 256 columns per bucket.","shortMessageHtmlLink":"Bucketize the 4-bit quantization for more accuracy."}},{"before":"2f3e9bc0f5e6e482efdaea40c53cfbab02ef9687","after":"8cc82ae7e20256f432d0e0c16c9fe384dedafa8b","ref":"refs/heads/k4bit","pushedAt":"2023-03-23T05:56:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Make separate matrix_vector_muls for 4-bit quantization rather than using matrix_mul for them.","shortMessageHtmlLink":"Make separate matrix_vector_muls for 4-bit quantization rather than u…"}},{"before":"b8946da2d8fe78380ca07734e3a80f6640c57b1c","after":"2f3e9bc0f5e6e482efdaea40c53cfbab02ef9687","ref":"refs/heads/k4bit","pushedAt":"2023-03-23T04:57:33.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"K4 bit inference works now. Performance isn't as good as I'd like it to be though.","shortMessageHtmlLink":"K4 bit inference works now. Performance isn't as good as I'd like it …"}},{"before":null,"after":"b8946da2d8fe78380ca07734e3a80f6640c57b1c","ref":"refs/heads/k4bit","pushedAt":"2023-03-23T04:06:13.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Implement matrix multiplication for 4-bit * 32-bit floats.\n\nAs of this commit, test works. But I want to optimize this a bit, seeing\nif increasing load instruction : arithmetic instruction ratio will make\nsingle-threaded performance a bit speedier.","shortMessageHtmlLink":"Implement matrix multiplication for 4-bit * 32-bit floats."}},{"before":"26f343ad1599aafc51ff68f72e493a859c6b29dd","after":"f6249e8d9fa1a0be7000cfb56d9ca73971bd2584","ref":"refs/heads/master","pushedAt":"2023-03-21T08:15:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Add skeleton code for 4-bit quantization.\n\nThe type is now recognized and I have a very simple quantizer too but no\noperations are done yet.","shortMessageHtmlLink":"Add skeleton code for 4-bit quantization."}},{"before":"957a8f9f98f91632b86a8dbf3d8dac28d4e5addb","after":"26f343ad1599aafc51ff68f72e493a859c6b29dd","ref":"refs/heads/master","pushedAt":"2023-03-21T02:15:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Add a flag that will exit the HTTP server after just one query.\n\nThis is for some experiments I want to run to kill the server gracefully\nwhenever I pull out the logits out of it from a Python script.","shortMessageHtmlLink":"Add a flag that will exit the HTTP server after just one query."}},{"before":"5e241722cb246b9b1011c1d9ad8688b167e68726","after":"957a8f9f98f91632b86a8dbf3d8dac28d4e5addb","ref":"refs/heads/master","pushedAt":"2023-03-21T02:02:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Mention that `server` feature must be turned on to use the inference API.","shortMessageHtmlLink":"Mention that <code>server</code> feature must be turned on to use the inference …"}},{"before":"d85ed7f23e464b2657fff7dfe0cd3d79d87676bd","after":"5e241722cb246b9b1011c1d9ad8688b167e68726","ref":"refs/heads/master","pushedAt":"2023-03-21T01:30:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Fix compilation when opencl feature is being used.","shortMessageHtmlLink":"Fix compilation when opencl feature is being used."}},{"before":"a8320613a129e040be5b4c4c7c3ebbccf59f4dc8","after":"d85ed7f23e464b2657fff7dfe0cd3d79d87676bd","ref":"refs/heads/master","pushedAt":"2023-03-21T01:29:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Mention HTTP server in features in README.md","shortMessageHtmlLink":"Mention HTTP server in features in README.md"}},{"before":"b9be485610edfbb3bfc97633bfa72c436f7a5516","after":"a8320613a129e040be5b4c4c7c3ebbccf59f4dc8","ref":"refs/heads/master","pushedAt":"2023-03-21T01:29:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Fix some things in README.md after proofreading it and removing lies.","shortMessageHtmlLink":"Fix some things in README.md after proofreading it and removing lies."}},{"before":"9c86c17318623834877a20bbf51ef461956242d6","after":"b9be485610edfbb3bfc97633bfa72c436f7a5516","ref":"refs/heads/master","pushedAt":"2023-03-21T01:26:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Noeda","name":"Mikko Juola","path":"/Noeda","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/833719?s=80&v=4"},"commit":{"message":"Add simple HTTP API support.\n\nIt took annoyingly a lot of effort just to make this simple server.\n\nI tried rouille web framework first, but it didn't support getting\nchunked output to the client line-by-line. (seems that if it exposed\nmore details about the underlying tiny-http package I could have hacked\nit to work).\n\nI went with Rocket because it had less async stuff and seemed decent.\n\nI got weird issues where it seemed as if memory use kept increasing and\nincreasing. I may have got that fixed but I couldn't figure out what\nmade it use so much memory, even tools like valgrind and heaptrack told\nme there isn't that much memory allocated but I can see RES increasing\nin `htop`.\n\nSwitched to MiMalloc as it seems to slightly decrease memory use.\n\nAdded details about the inference server to README.md. And also added an\nexample Python script of it.\n\nI want to use this feature to later investigate how much do\nquantizations or f16/f32 affect output. Easier to do such things on\nPython.","shortMessageHtmlLink":"Add simple HTTP API support."}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADFNUARwA","startCursor":null,"endCursor":null}},"title":"Activity · Noeda/rllama"}